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class of natural products that includes many compounds 
possessing antibiotic or other pharmacological 
properties, such as erythromycin, tetracyclines, 
rapamycin, avermectin, monensin, epothilones and FK506. 
5 In particular, polyketides are abundantly produced by 

Streptomyces and related actinomycete bacteria. They are 
synthesised by the repeated stepwise condensation of 
acylthioesters in a manner analogous to that of fatty 
acid biosynthesis. The greater structural diversity found 

10 among natural polyketides arises from the selection of 
(usually) acetate or propionate as ^'starter" or 
^^extender" units; and from the differing degree of 
processing of the ^-keto group observed after each 
condensation. Examples of processing steps include 

15 reduction to g-hydroxyacyl-, reduction followed by 

dehydration to 2-enoyl-, and complete reduction to the 
saturated acylthioester. The stereochemical outcome of 
these processing steps is also specified for each cycle 
of chain extension. In addition, the biosynthetic 

20 pathways to many polyketides involve additional enzyme- 
catalysed '^modifications -whic^h may-inclAadei -methyiation by 
O- and C-methyltransf erases, hydroxylation by cytochrome 
P450 enzymes, other oxidation or reduction processes, and 
the biosynthesis and attachment of novel sugars and/or 

25 deoxy sugars. 
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The biosynthesis of polyketides is initiated by a 
group of chain-forming enzymes known as polyketide 
synthases. Two classes of polyketide synthase (PKS) have 
been described in actinomycetes. One class, named Type I 
5 PKSs, represented by the PKSs for the macrolides 

erythromycin, oleandomycin, avermectin and rapamycin, 
consists of a different set or ^^module" of enzymes for 
each cycle of polyketide chain extension. {For examples 
see Cortes, J. at al. Nature (1990) 348:176-178; Donadio, 

10 S, et al. Science (1991) 252:675-679; Swan, D.G. et al, 

Mol. Gen. Genets (1994) 242:358-362; MacNeil, D.J. et al. 
Gene (1992) 115:119-125; Schwecke, T. et al. Proc. Natl. 
Acad. Sci. USA (1995) 92:7839-7843.) 

The term ^^extension module" as used herein refers to 

15 the set of contiguous domains, from a p-ketoacyl-ACP 

synthase (^^KS") domain to the next acyl carrier protein 
(^'ACP") domain, which accomplishes one cycle of 
polyketide chain extension. The term ''^loading module" is 
used to refer to any group of contiguous domains which 

20 accomplishes the loading of the starter unit onto the PKS 

and thus re-nders lis-Hvau^abl-e to -the KS "domain of the 
first extension module. The length of polyketide formed 
has been altered, in the case of erythromycin 
biosynthesis, by specific relocation using genetic 

25 engineering of the enzymatic domain of the erythromycin- 
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producing PKS that contains the chain releasing 
thioesterase/cyclase activity (Cortes J. et al» Science 

(1995) 268:1487-1489; Kao, CM. et al. J. Am. Chem. Soc. 

(1995) 117:9105^9106). 
5 In-frame deletion of the DNA encoding part of the 

ketoreductase domain in module 5 of the erythromycin- 
producing PKS (also known as 6-deoxyerythronolide B 
synthase, DEBS) has been shown to lead to the formation 
of erythromycin analogues 5, 6-dideoxy-3-a-mycarosyl-5- 
10 oxoerythronolide B, 5, 6-dideoxy-5-oxoerythronolide B and 

5. 6- dideoxy, 6-3-epoxy-5-oxoerythronolide B (Donadio, S. 
et al. Science (1991) 252:675-679). Likewise, alteration 
of active site residues in the enoylreductase domain of 
module 4 in DEBS, by genetic engineering of the 

15 corresponding PKS-encoding DNA and its introduction into 

Saccharopolyspora erythraea, led to the production of 

6. 7- anhydroerythromycin C (Donadio, S. et al. Proc. Natl. 
Acad. Sci. USA (1993) 90:7119-7123). 

International Patent Application niimber WO 93/13663 
20 describes additional types of genetic manipulation of the 

DEBS genes that are capable of producing altered 
polyketides. However many such attempts are reported to 
have been unproductive (Hutchinson, C.R. and Fujii, I, 
Annu- Rev. Microbiol. (1995) 49:201-238, at p. 231). The 
25 complete DNA sequence of the genes from StreptomycBs 
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multi-enzyme component (DEBSl) for the erythromycin PKS 
in place of the normal loading module. Certain novel 
polyketides can be prepared using the hybrid PKS gene 
assembly, as described for example in WO 98/01571. 
5 WO 98/01546 further describes the construction of a 

hybrid PKS gene assembly by grafting the loading module 
for the rapamycin-producing polyketide synthase onto the 
first multi-enzyme component (DEBSl) for the erythromycin 
PKS in place of the normal loading module* The loading 

10 module of the rapamycin PKS differs from the loading 

modules of DEBS and the avermectin PKS in that it 
comprises a CoA ligase domain, an enoylreductase (^^ER") 
domain and an ACP, so that suitable organic acids 
including the natural starter unit 3,4- 

15 dihydroxycyclohexane carboxylic acid may be activated in 

situ on the PKS loading domain and, with or without 
reduction by the ER domain, transferred to the ACP for 
intramolecular loading of the KS of extension module 1 
(Schwecke, T, et al. Proc. Natl. Acad- Sci. USA (1995) 

20 92:7839-7843). WO 98/51695 and WO 98/49315 describe 

- addit ional types of -genet-ic-^manipulation of the -DEBS 
genes that are capable of producing altered polyketides. 

The second class of PKS, named Type II PKSs, is 
represented by the synthases for aromatic compounds. Type 

25 II PKSs contain only a single set of enzymatic activities 

- 6 - 
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for chain extension and these are re-used as appropriate 
in successive cycles (Bibb, M,J. et al. EMBO J* (1989) 
8:2727-2736; Sherman, D.H. et al. EMBO J. (1989) 8:2717- 
2725; Fernandez-Moreno, M,A. et al. J. Biol. Chem, (1992) 
5 267:19278-19290). The ^^extender" units for the Type II 

PKSs are usually acetate units, and the presence of 
specific cyclases dictates the preferred pathway for 
cyclisation of the completed chain into an aromatic 
product (Hutchinson, C,R. and Fujii, I, Ann. Rev, 

10 Microbiol. (1995) 49:201-238). Hybrid polyketides have 
been obtained by the introduction of cloned Type II PKS 
gene-containing DNA into another strain containing a 
different Type II PKS gene cluster, for example by 
introduction of DNA derived from the gene cluster for 

15 actinorhodin, a blue-pigmented polyketide from 
Streptomyces coelicolor, into an anthraquinone 
polyketide-producing strain of Stxeptomyces galileus 
(Bartel, P.L. et al. J. Bacterid. (1990) 172:4816-4826). 
The minimal number of domains required for 

20 polyketide chain extension on a Type II PKS when 

expressed in a Streptoinyces" "coelico'i'or'-h-ost: cell (the 
''minimal PKS") has been defined for example in WO 
95/08548 as containing the following three polypeptides 
which are products of the act! genes: firstly KS; 

25 secondly a polypeptide termed the CLF with end-to-end 
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amino acid sequence similarity to the KS but in which the 
essential active site residue of the KS^ namely a 
cysteine residue, is substituted either by a glutamine 
residue or, in the case of the PKS for a spore pigment 
5 such as the whiE gene product (Davis, N.K. and Chater, 

K.F. Mol. Microbiol, (1990) 4:1679-1691) by a glutamic 
acid residue; and finally an ACP. The CLF has been stated 
(for example in WO 95/08548) to be a factor that 
determines the chain length of the polyketide chain that 

10 is produced by the minimal PKS. However it has been found 

(Shen, B. et al. J. Am. Chem. Soc. (1995) 117:6811-6821) 
that when the CLF for the octa.ketide actinorhodin is used 
to replace the CLF for the decaketide tetracenomycin in 
host cells of Streptomyces glaucescens, the polyketide 

15 product is not found to be altered from a decaketide to 
an octaketide^ so the exact role of the CLF remains 
unclear. An alternative nomenclature has been proposed in 
which KS is designated KSa and CLF is designated KSP, to 
reflect this lack of knowledge (Meurer, G- et al. 

20 Chemistry & Biology (1997) 4:433-443). The mechanism by 

"which "a"cetat:e ■'startex"^un±ts"and -acetatB -^'xtender units 
are loaded onto the Type II PKS is not known, but it is 
speculated that the malonyl-CoA: ACP acyltransf erase of 
the fatty acid synthase of the host cell can fulfil the 

25 same function for the Type II PKS (Revill, W.P. et al. J. 
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Bacteriol. (1995) 177:3946-3952). 

WO 95/08548 describes the replacement of 
actinorhodin PKS genes by heterologous DNA from other 
Type II PKS gene clusters, to obtain hybrid polyketides. 
5 It also describes the construction of a strain of 

Streptomyces coelicolor which substantially lacks the 
native gene cluster for actinorhodin, and the use in that 
strain of a plasmid vector pRM5 derived from the low-copy 
number vector SCP2* isolated from Streptomyces coelicolor 
10 (Bibb, M.J- and Hopwood, D.A. J. Gen. Microbiol. (1981) 

126:427-442) and in which heterologous PKS-encoding DNA 
may be expressed under the control of the divergent acti/ 
actlll promoter region of the actinorhodin gene cluster 
(Fernandez-Moreno, M.A. et al. J. Biol- Chem. (1992) 
15 267:19278-19290). The plasmid pRM5 also contains DNA from 

the actinorhodin biosynthetic gene cluster encoding the 
gene for a specific activator protein, ActII-orf4. The 
ActII-orf4 protein is required for transcription of the 
genes placed under the control of the actl/actlll 
20 bidirectional promoter and activates gene expression 

during the transition from growth to- -stationary pha-se in 
the vegetative mycelium (Hallam, S.E. et ai. Gene (1988) 
74:305-320) . 

Type II clusters in Streptomyces are known to be 
25 activated by pathway-specific activator genes (Narva, 
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K,E* and Feitelson, J.S. J. Bacteriol. (1990) 172:326- 
333; Stutzman-Engwall, K,J, at al. J. Bacteriol, (1992) 
174:144-154; Fernandez -Moreno, M.A. et al. Cell (1991) 
66:769-780; Takano, E. et al. Mol. Microbiol. (1992) 
6:2797-2804; Gramajo, H.C. et al. Mol. Microbiol. (1993) 
7:837-845). The DnrI gene product complements a mutation 
in the actJJ-orf4 gene of S. coelicolorr implying that 
DnrI and ActII-orf4 proteins act on similar targets. A 
gene {srmR) has been described (EP 0 524 832 A2) that is 
located near the Type I PKS gene cluster for the 
macrolide polyketide spiramycin. This gene specifically 
activates the production of the macrolide antibiotic 
spiramycin, but no other examples have been found of such 
a gene. Also, no homologues of the Actll-orf 4/DnrI/RedD 
family of activators have been described that act on Type 
I PKS genes. WO 98/01546 describes the use of the Actll- 
orf4 family of activators in conjunction with their 
cognate promoters (e.g actII-orf4 with the acti promoter) 
in a heterologous actinomycete to obtain high level 
expression of recombinant Type I polyketide synthase 
genes . 

Although large numbers of therapeutically important 
polyketides have been identified, there remains a need to 
obtain novel polyketides that have enhanced properties or 
possess completely novel bioactivity. The complex 
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polyketides produced by Type I PKSs are particularly 
valuable, in that they include compounds with known 
utility as anthelminthics, insecticides, 
iiranunosuppressants, antifungal agents or antibacterial 
5 agents. Because of their structural complexity, such 

novel polyketides are not readily obtainable by total 
chemical synthesis, nor by chemical modifications of 
known polyketides. 

There is also a need to develop reliable and 

10 specific ways of deploying individual genes and portions 

of genes in practice so that all, or a large fraction, of 
hybrid PKS genes that are constructed, are viable and 
produce the desired polyketide product. This includes the 
development of advantageous host strains for expression 

15 . of such genes. For example many polyketides are rendered 
bioactive by the action of further enzymes other than the 
polyketide synthase, and host strains that contain and 
are able to express the genes for such enzymes are 
particularly convenient for the efficient synthesis of 

20 the bioactive material. In those cases where the 

construction of .a -^known- .or -a novel^rp.olyke-tide requires 
specialised precursors, host strains containing and able 
to express the genes for key enzymes that enhance the 
production of such specialised precursors are equally 

25 valuable and desirable. There is also a need to develop 
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rational methods of increasing the expression level of 
all the genes required for production of a specific 
polyketide. Clearly also a host cell which is 
advantageous for the above reasons, and/or because of 
other favourable characteristics including but not 
limited to its speed of growth, excellent handling 
characteristics in fermentation, and ease of 
transformation with DNA by various techniques, can be 
made even more favourable by the cloning into that cell 
of such auxiliary genes for polyketide modification, or 
gene activation, or post-translational modification, or 
precursor supply. 



The DNA sequences have been disclosed for several 
Type I PKS gene clusters that govern the production of 
16-raembered macrolide polyketides, including the tylosin 
PKS from Streptomyces fradiae (application EP 0 791 655 
A2), the niddamycin PKS from Streptomyces caelestis 
(Kavakas, S.J. et al. J. Bacterid. (1997) 179:7515-7522) 
and the spiramycin PKS from Streptomyces amhofaciens 
(appl-icatlon EP 0791 -ess A2-) . DNA sequences -have also 
been disclosed for Type I PKS gene clusters that govern 
the production of further complex polyketides, for 
example rifamycin from Amycolatopsis mediterranei (WO 
98/07868), and soraphen from Sorangium cellulosum (US 
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5716849) , but so far no DNA sequence has been disclosed 
for one of the most widespread and important classes of 
complex polyketides, the polyethers . 

Polyethers form an important group of complex 
5 polyketide antibiotics (Westley, J.W. in ^^Antibiotics IV. 

Biosynthesis'' (Corcoran, J.W. Ed.), Springer-Verlag, New 
York (1981) p. 41-73) . They are polyoxygenated carboxylic 
acids which act as selective ionophores transporting 
cations across the cell membrane of target cells and 

10 thereby causing depolarisation and cell death. Certain 

polyethers including monensin, lasalocid and tetronasin 
are in widespread use in animal husbandry as 
coccidiostats (principally targetted against Exmeria 
spp.) and as growth promoters. Polyethers have also been 

15 reported to be active in vitro and in vivo against the 

malarial parasite Plasmodium falciparum (Gumila, C. et 
ai. Antimicrobial Agents and Chemotherapy (1997) 41: 523- 
529) • 

Polyethers contain multiple asymmetric centres and 
20 are characterised by the presence of tetrahydrofuran and 

^ tetra'hydropyran ring-s, -producing ^a -"ch^.-rarctexistic shape 
which is non-polar on its outer surface and therefore 
well adapted for transport of material across bacterial 
membranes; and provides on its inner surface polar 
25 coordinating ligands for a centrally-bound metal ion. In 
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addition to tetrahydrof uran and tetrahydropyran rings, 
other groups which are often present include spiroketal, 
dispiroketal, and substituted benzoic acid moieties and 
occasionally other groups for example a tetronic acid or 
5 a 6-meinbered carbocyclic ring 

Monensins A and B are produced by the actinomycete 
Streptomyces cinnamonensis , Their structures are shown in 
Figure !• Monensin B differs from monensin A only in the 
presence of a methyl sidechain at C-16 rather than an 

10 ethyl sidechain. Monensin selectively binds and 

transports sodium ions. In addition to its antibacterial 
and antifungal properties monensin has some activity 
against protozoal parasites such as the malarial parasite 
Plasmodium falciparum. Although the structures of 

15 polyethers differ significantly from those of other 

complex polyketides such as the polyhydroxylated and 
polyene macrolides, their biosynthesis appears to take 
place by a metabolic pathway which has many common 
elements. Thus experiments using carbon 14-labelled 

20 precursors have shown that monensin A is synthesised from 

'^ive acetate,' -one butyr-a*t-e -and -^seven- -propionate units 
(Day, L.E. et aJ. Antimicrob. Agents Chemother. (1973) 
4:410-414). Similarly experiments using precursors 
doubly-labelled with carbon-13 and oxygen-18 have shown 

25 that oxygens (0)1, (0)3, (0)4, (0)5, (0)6 and (0)10 of 

- 14 - 
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monensin arise from the carboxylate oxygens of either 
propionate or acetate, while growth in the presence of 
oxygen-18 oxygen gas demonstrated that the three 
remaining ether oxygens (0)7, (0)8 and (0)9 are derived 
5 from molecular oxygen (Cane, D.E. et al., J. Mi. Chem. 

Soc, (1981) 103:5962-5965; Cane, D.E. et al. J. Am. Chem. 
Soc. (1982) 104:7274 - 7281; Ajaz, A. A. and Robinson, 
J. A. J. Chem, Soc. Chem. Commun. (1983) 12:679-680), 
These findings have been rationalised by proposing that 

10 the biosynthesis of monensin proceeds via an acyclic 

triene intermediate (1) in which the geometry of all 
three carbon-carbon double bonds is E (entgegen) rather 
than Z (zusammen). The triene is then proposed to be 
subject to epoxidation to a tri-epoxide (2) and then ring 

15 opening is proposed to occur with concomitant sequential 

formation of the five ether rings as shown in Figure 2A. 
Such a biosynthetic pathway, first mooted by Westley in 
1974 (Westley J.W. et al., J. Antibiot. (1974) 27:597- 
604) accounts for the observed stereochemistry at the 

20 multiple asymmetric centres in monensin, (Cane, D.E. et 

al. -J. Am. Chem. Soc. •(T982) 104:7274-7281; Sood, G.R. et 
ai- J. Chem. Soc. Chem. Commun. (1984) 21:1421-1424) and 
analogous schemes can be used to account for the 
biosynthesis of other known polyethers. such as lasalocid 

25 A (Hutchinson C.R. et al., J. Am. Chem. Soc. (1981) 
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103:5953-5956), tetronasin (ICI 139603) (Demetriadou, 
A.K. et al. J. Chem. Soc- Chem. Commun. (1985) 7:408-410) 
and narasin (Spavold, Z. et al. Tetrahedron Letters 
(1986) 27:3299-3302). The hydroxylation at C'26 and the 
introduction of an 0-methyl group on oxygen 3 -are 
proposed to occur as late steps in the biosynthesis, 
after formation of the polyether structure. 

Unfortunately key aspects of the biosynthetic scheme 
shown in Figure 2A have so far eluded experimental 
confirmation. No biosynthetic intermediates have been 
isolated from mutants of 5. clnnamonensis that are 
blocked in early stages of monensin production. 26- 
deoxymonensin A has been isolated from a S. cinnamonensis 
mutant partially blocked in monensin production 
(Ashworth, D.M. et al. J. Antibiot. (1989) 42:1088-1099) 
and 3-0-demethylmonensins A and B have been recovered as 
minor components from the fermentation broth of a 
monensin-producing strain (Pospisil, S. et al. J. 
Antibiot. (1987) 40:555-557). When fed to cells of S. 
clnnamonensis in radio-labelled form, neither 
26-deoxymonensih A, nor 3-0-demethylmonensin A, nor 3-0- 
demethyl, 26-deoxymonensin A were significantly 
incorporated into monensin A (Ashworth, D.M. et al. J. 
Antibiot. (1989) 42:1088-1099), either because they are 
actively excluded or because these modifications in fact 
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occur earlier in the biosynthetic pathway so that these 
metabolites are shunt products not readily converted into 
the final antibiotic by the respective hydroxylase or 
methyltransf erase. Similarly, the putative all (E)-triene 
5 precursor (1) has been synthesised and shown not to 

become incorporated into monensin when fed to growing 
cells of S. cinnamonensis (Holmes, D.S. et al. Helv. 
Chim- Acta (1990) 73:239-259). An alternative pathway has 
been proposed, as shown in Fig 2B, based on the 

10 transition-metal-mediated oxidation of 1,5-dienes (Walba, 

D.M. and Edwards, P.D. Tetrahedron Lett. (1980) 21:3531- 
3534), The triene intermediate (4) would different from 
that of Figure 2A (1) only in that each carbon-carbon 
double bond would have the (Z) -configuration (Townsend, 

15 C.A. and Basak, A. Tetrahedron (1991) 47:2591-2602) and 

not the (E) - configuration. 

The genetic basis of secondary metabolite 
biosynthesis essentially exists in the genes which code 
for the individual biosynthetic enzymes and in the 

20 regulatory elements which control the expression of the 

'bios-ynthet:ic-genes.^^*Th-e ^genes encoding biosynthesis of 
polyketides in actinomycetes have hitherto been found as 
clusters of adjacent genes, ranging in size from 
20 kilobasepairs (kbp) to over 100 kbp. The clusters 

25 often contain specific regulatory genes and genes 
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conferring resistance of the producing strain to its own 
antibiotic- 

In various of its aspects the invention provides the 
following: - 

5 (1) a DNA sequence encoding at least one-peptide 

necessary for the biosynthesis of monensin, preferably 
comprising one or more of the following genes: mon BI^ 
mon Bllr mon CI^ mon CII^ mon Hf mon RI^ mon RII^ mon T, 
mon AIX and jnon AX as depicted in the appended sequence 
10 data or an allele or mutation thereof; 

(2) a DNA sequence according to the first aspect 
comprising all of the genes listed therein or an allele 
or mutation thereof; 

(3) a DNA sequence according to the first aspect 
15 comprising the complete monensin gene cluster; 

(4) a DNA sequence coding for one or more of the 
peptides set out below, said peptide having the amino 
acid sequence as set out in the appended sequence data or 
being a variant thereof having the specified activity: 

20 peptide activity 

'mon CII 'epoxyhydrolase/cyclase 

mon E S-adenosylmethionine-dependent methyltransf erase 

mon T monensin resistance gene 
mon RII repressor protein 
25 mon AIX thioesterase 
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man 


AI 


polyketide 


synthase multienzyme 


mon 


All 


polyketide 


synthase multienzyme 


mon 


AIII 


polyketide 


synthase multienzyme 


mon 


AIV 


polyketide 


synthase multienzyme 


mon 


AVI 


polyketide 


synthase multienzyme 


mon 


AVII 


polyketide 


synthase multienzyme 


mon 


AVIII 


polyketide 


synthase multienzyme 


mon 


H 


regulatory protein 


mon 


CI 


flavin-dependent epoxidase 


mon 


BII 


carbon-carbon double bond isomerase 


mon 


BI 


carbon-carbon double bond isomerase 


mon 


D 


cytochrome 


P450 hydroxylase 


mon 


RI 


activator protein 


mon 


AX 


thioesterase 



15 

(5) a recombinant cloning or expression vector 
comprising a DNA sequence according to any of aspects 1-4; 

(6) a transformant host cell which has been 
transfojrmed to contain a DNA sequence according to- any of 

20 aspects 1-4 and is capable of expressing a corresponding 

peptide ; 

(7) a hybridization probe comprising a polynucleotide 
which binds specifically to a region of the monensin gene 
cluster selected from mon BI, mon BII, mon CI, mon CII, 

25 mon H, mon RI, mon RII, mon T, mon AIX and mon AX; 
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(8) use of a probe according to aspect (7) in a 
method of detecting the presence of a gene cluster which 
governs the synthesis of a polyether, and optionally 
isolating a gene cluster detected thereby; 
5 (9) Use of a probe comprising a polynucleotide which 

binds specifically to a gene responsible for levels of 
activity of the monensin gene cluster, preferably a 
regulatory gene, resistance gene or thioesterase gene, 
more preferably the regulatory gene mon RI, in a method of 
10 detecting an analogous gene in a gene cluster of another 

polyketide, preferably a polyether, and optionally 
manipulating the gene detected thereby to alter the level 
of expression of said other polyketide; 

(10) a host cell, preferably Streptomyces 

15 cinnamonensis, containing a heterologous gene under the 

control of the mon RI gene and a monensin promoter; 

(11) use of a portion of the monensin gene cluster 
having chain terminating activity, preferably comprising 
at least one of mon AIX and mon AX or a mutant or allele 

20 thereof having chain terminating activity, to effect chain 

• - ^release of a 'peptide -other -than one' r-equired for -monensin 
biosynthesis; 

(12) use of a portion of the monensin gene cluster 
having carbon-carbon double bond isomerase activity, 

25 preferably comprising at least one of mon BI and mon BIT 
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or a mutant or allele thereof having isomerase activity to 
provide a desired stereochemical outcome in the synthesis 
of a polyketide other than monensin; 

(13) a polypeptide encoded by a portion of the 

5 monensin gene cluster, preferably comprising at least one 

of man BI and man BII or a mutant or allele thereof, 
having carbon-carbon double bond isomerase activity; 

(14) an epoxidase enzyme encoded by mon CI or a 
derivative or variant thereof having epoxidase activity; 

10 (15) a cyclase enzyme encoded by mon CII or a 

derivative or variant thereof having cyclase activity. 

Some embodiments of the invention will now be 
described by way of example with reference to the 
accompanying drawings in which: 
15 Fig 1 shows the structure of monensins A and B; 

Fig 2 illustrates proposed biosynthetic pathways; 
Fig 3 illustrates the proposed organization of the 
monensin polyketide synthase (PKS) enzyme complex; and 
Fig 4 illustrates the proposed organization of the 
20 monensin biosynthetic gene cluster. 

The ' overall ^gene '^organization -of the ^monensin 
biosynthetic gene cluster, as shown in Fig 4, is similar 
to that previously found for many macrolide biosynthetic 
gene clusters, which have one or more open reading frames 
25 (ORFs) encoding large multifunctional PKSs flanked by 
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other genes which encode functions required for the 
biosynthesis of the antibiotic. In the case of monensin, 
there is an unusually high number of distinct ORFs 
encoding PKS multi-enzymes (eight in total, labelled monAI 
to monAVIII) but there is again a separate motiule of 
enzymes for each cycle of polyketide chain extension, 
exactly as found for modular PKSs for macrolide 
biosynthesis (see Fig 3) . Thus there are 12 condensations 
predicted to be required for the production of the carbon 
skeleton of monensin, and in agreement with this there are 
found to be 12 extension modules of PKS enzymes 
distributed among the 8 PKS ORFs. However, as mentioned in 
detail below, the other genes in the monensin cluster 
include genes which have not previously been found in any 
other gene cluster for the biosynthesis of a complex 
polyketide, and which are not significantly similar to any 
genes in published sequence databases. The cloned DNA for 
these genes is useful to allow the diagnosis that a 
polyketide biosynthetic gene cluster in any actinomycete, 
uncovered previously by conventional hybridization against 
a PKS -gene" probe f rom -'(-s^y) the DEBS -or some -other 
characterised PKS gene cluster, is one that governs the 
synthesis of a polyether; and these genes are also 
valuable either singly or in combination as specific 
hybridization probes for the specific detection and 



- 22 - 



wo 01/68867 



PCT/GBOO/02072 



isolation of additional polyether biosynthetic gene 
clusters. Examples of these previously-unknown genes are 
the genes laonBI, monBII, monCI and monCII. In addition the 
regulatory genes monH monRI, and laonRII and the resistance 
5 gene monT and the thioesterase genes monAIX and laonAX are 

all useful for the detection of analogous genes in other 
polyether clusters which are required for the rational 
manipulation of such genes in order to increase levels of 
the specific product . 

10 The cloned and sequenced cluster of genes for 

monensin biosynthesis is useful secondly in the 
engineering of mutant strains of S. cinnamonensis and of 
other actinomycetes which are suitable strains for the 
high level production of either natural or novel 

15 recombinant polyketides. The sequence of the monensin 

cluster disclosed here shows the surprising fact, that the 
gene cluster contains a gene monRI whose gene product has 
an amino acid sequence highly similar to that of actll- 
orf4, the pathway-specific activator gene which activates 

20 the actJ and other promoters of the actinorhodin 

biosynthetic gene cluster of Streptomyces coelicolor. The 
recognition of this aspect of the natural regulation of a 
Type I PKS cluster is important and valuable because 
first, it is possible to increase the yield of monensin by 

25 increasing the level of the activator MonRI, either by 
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placing the gene monRI under the control of a powerful 
promoter or arranging for the presence within the cells of 
one or more additional copies of the monRI gene (as 
exemplified below) ; secondly, it will be possible to use 
5 the monRI gene as a specific hybridisation probe to locate 

similar genes in other complex PKS gene clusters, 
especially other polyether PKS gene clusters but also 
polyene and macrolide gene clusters and all other Type I 
modular PKS gene clusters; even in cases where (as for 

10 rapamycin and erythromycin) no such gene has been 

previously found within the currently accepted physical 
limits of the relevant biosynthetic gene cluster. In such 
cases the monRI gene probe might be expected to uncover 
the activator even if it resides on the chromosome at some 

15 distance from the main body of the gene cluster; and 

simple experiments would then show whether the 
activator (s) so uncovered are involved in regulation of 
the biosynthesis of those particular metabolites; thirdly, 
increasing the copy nximber of the monRI gene or of any of 

20 the activator genes uncovered will tend to increase the 

-yield of a -heterologous polyketide by '"crosstalk" where 
the activator mimics the presence of the normal activator 
for the transcription of the genes for that heterologous 
polyketide synthase. It is clear from recently published 

25 work (Wietzorrek, A. and Bibb, M. Mol. Microbiol- (1997) 
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25:1181-1184) that the ActII-orf4 family of activators 
exert their effects by binding to promoter regions within 
the target gene cluster, so it will be possible to use the 
monJ^J gene together with monensin promoter regions to 
5 drive the high-level transcription and transTation of 

heterologous genes in Streptomyces cinnamonensiSr and 
perhaps in other host strains too; such genes need not be 
PKS genes or even involved in polyketide biosynthesis. 
Monensin promoter regions are found at the 5' end of genes 

10 or groups of genes in the cluster and their location is 

clear from the sequence analysis disclosed here. Thus a 
useful vector would provide the monensin promoter and the 
ribosome binding site and continue up to the start of the 
open reading frame, after which the monensin ORF naturally 

15 found there would be replaced by the heterologous gene. 

The relative strength of the monensin promoters can be 
readily determined using any one of a number of known 
promoter probes, i.e. genes whose expression gives rise to 
readily measurable and quantifiable effects, such as Green 

20 Fluorescent Protein (GFP) ; or beta-galactosidase in the 

• -presence of a chromogenic substrate. It should -be possible 
to mutate randomly the small region of the monensin 
promoters especially likely to interact with the MonRI 
activator (identified by the presence of tandem 

25 heptanucleotide repeats with a common consensus sequence 
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between the various monensin promoters) (Wietzorrek, A. 
and Bibb, M, Mol. Microbiol. (1997) 25:1181-1184), and to 
determine the optimal DNA sequence for the maximal 
activation effect using either 5. cinnamonensis 
5 (preferably - in case there are other unknown'' factors that 

make the activation function better in this strain than in 
other heterologous systems), or even in another host 
actinomycete strain. If the natural monensin promoters 
were mutated to have this optimal recognition sequence, 

10 then this would further increase the production of 

monensin. By extension, the use of this modified monensin 
promoter in conjunction with the inonRI gene in 
heterologous systems could form the basis of further 
improvements in expression of polyketide synthases or 

15 other genes, either by appropriate chromosomal alterations 

to introduce the altered promoter and also the monRI gene; 
or by provision of vectors containing these optimised 
signals linked to specific genes and housed in suitable 
host cells. 

20 The sequencing of the monensin cluster has uncovered 

-another strategy for gene regulation in such Type I 
clusters. The previously-sequenced genes for the rapamycin 
biosynthetic pathway in Streptomyces hygroscopicus 
included a gene of unknown function (rapH) • A closely 

25 similar gene has now been found in the monensin 
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biosynthetic gene cluster {moxiH) , and it is clear from 
this recurrence (and the comparison of the sequences with 
those of database proteins) that this gene is potentially 
an important DNA-binding sensor gene which acts to 
5 regulate the transcription of the cluster in concert with 

other regulatory signals. Simple experimentation is needed 
in order to define whether the gene is an activator, in 
which case putting in another copy or increasing its 
transcription will have the potential to increase 

10 polyketide biosynthesis; or alternatively the rapH gene 

product may be a negative regulator, whereupon deletion of 
this gene may release the biosynthetic pathway from this 
inhibitory effect and increase yields. 

There is a continuing need to develop new methods of 

15 high-level production of bioactive metabolites and other 

valuable gene products in actinomycetes . Streptomyces 
cinnamonensis is a recognised and very valuable industrial 
strain for the production of very high levels of monensin, 
it is readily transformable with DNA by standard methods 

20 of conjugation or of protoplast transformation, it is a 

host for numerous known broad range" plasmids including 
well-known expression plasmids of both high- and low-copy 
number, it also grows quickly relative to other 
actinomycete strains (for example about three times faster 

25 than wild type Saccharopolyspora erythraea the 
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erythromycin producer/ under comparable conditions) and 
sporulates relatively easily. Heterologous polyketides can 
be expressed in Streptomyces cinnamonensxs using for 
example the low-copy number plasmid pCJR24 (which has no 
5 origin of replication active in actinomycetes' so is 

maintained by integration into the chromosome) (Rowe, C, 
at al. Gene (1998) 216:215-223) or the related plasmid 
pCJR29 in which the polyketide synthase gene(s) are placed 
under the control of the actJ promoter which is activated 

10 by the ActII-orf4 activator; or alternatively the monAI 

promoter can be substituted together with the MonRI 
activator; or some other pairing of activator and cognate 
promoter chosen from either a Type II or a Type I 
polyketide synthase gene cluster. As an example, the wild 

15 type strain of Streptomyces cinnamonsnsis has been used to 

express the plasmid pCJR29 (Rowe, C. et al. Gene (1998) 
216:215-223) containing as insert the three ORFs for the 
PKS governing the production of 6-deoxyerythronolide B, 
the macrolide precursor of erythromycin A in 

20 Saccharopolyspora erythraea, these genes being placed 

under the control of the pathway-specific acti promoter 
from Streptomyces coelicolor together with its cognate 
activator gene act JJ-orf 4. The transformed strain when 
cultivated in a suitable liquid medium produced 6- 

25 deoxyerythronolide B in good yield. 
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It is well known to the person skilled in the art 
that it is possible to use standard vectors unable to 
replicate in actinomycetes to introduce DNA into a 
Streptomyces cell, such DNA comprising two portions of 
5 contiguous DNA which are each identical to one of two 

portions of the cell's chromosome that are spaced up to 
100 kbp apart; and that through recombination between the 
incoming DNA and the chromosome occurring in both portions 
of DNA the net result is that the chromosomal sequence is 

10 replaced by the defective sequence originally that of the 

incoming DNA, Such a procedure has been applied to the 
monensin-producing strain of S. cinnamonensis as described 
in detail below, and a strain of S. cinnamonensis has been 
obtained that carries a specific deletion in the monensin 

15 cluster and which is unable to produce the antibiotic* The 

use of such a strain facilitates the production of 
heterologous polyketides by removal of the background of 
monensin production. 

The multiple uses of portions of the cloned and 

20 sequenced DNA from the monensin cluster will readily occur 

^^to ' the pexson skilied—xn-the 3rt. - A^ feature of 

the PKS of the monensin cluster is an unusual mechanism of 
polyketide chain initiation. We have found that the 
monensin PKS loading module has three domains, which from 

25 the amino-terminus of the protein are: a KSq domain, an 
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acyltransf erase domain and an ACP domain. We have 
uncovered this organisation in the PKS for the 14-raeinbered 
macrolide oleandomycin as well as in the monensin PKS, an 
organisation of the loading module previously only found 
for the 16-membered macrolides and in which fhe KSq domain 
(which looks like a ketosynthase or condensation domain 
except that the active site cysteine residue is 
substituted by a glutamine for which the single letter 
notation is Q) had been previously speculated to have no 
function. It was realised that the acyltransf erase of the 
loading module actually has malonyl-CoA and not acetyl-CoA 
as a substrate and that KSq is an active decarboxylase. It 
appears that a better discrimination can be achieved in 
the selection of the smaller acetate unit over propionate 
if the choice is made initially between methylmalonyl- and 
malonyl-CoA . 

An unprecedented feature of the monensin PKS genes is 
that no integral chain-terminating domain is present as a 
C-terminal appendage of the PKS extension module that 
catalyzes the twelfth and final chain extension. Because 
- the product of the ' monensin 'PKS is— a^'caxboxyllc acid, it 
would have been firmly predicted that chain release would 
have been catalyzed by such a C-terminal domain containing 
a ^^thioesterase" activity. Previously sequenced PKS gene 
sets have been of two sorts: first, those macrolide PKSs 
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typified by erythromycin, spiramycin, tylosin, niddamycin 
which have a readily recognisable C-terminal 
^^thioesterase" domain, which in these enzymes functions as 
a specific cyclase rather than releasing the polyketide 
product as a free carboxylic acid; secondly, those 
macrolide PKSs typified by rapamycin, FK506, and 
rifamycin, where there is an alternative and recognised 
mode of chain termination by transfer of the polyketide 
chain to an acceptor moiety, catalyzed by a specific 
enzyme (eg pipecolate incorporating enzyme for rapamycin 
(Schwecke T. et al. Proc. Natl- Acad, Sci. USA (1995) 
92:7839-7843) and FK506 (Mothamedi H. and Shafiee A, Eur. 
J. Biochemistry (1998) 256:528-534); arylamine synthetase 
for rifamycin (August P.R. et ai. Chemistry & Biology 
(1998) 5:69-79), 

The monensin PKS surprisingly falls into neither 
category, and therefore seems to be the first example of a 
novel mode of chain termination. It is novel and 
noteworthy in this connection that the monensin PKS gene 
cluster contains two small genes that encode discrete, 
monofuncti'onal thioesterase enzyme's. Although many PKS 
gene clusters have been previously shown to contain one 
such discrete thioesterase, none have been shown to have 
two. The role of such thioesterases is not known, although 
in the case of methymycin/pikromycin PKS, which has been 
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reported to be responsible for the biosynthesis of both 
the 12-membered macrolide methymycin and the 14-meinbered 
macrolide pikromycin (Xue Y.Q. Proc. Natl. Acad. Sci. USA 
(1998) 95:12111-12116) the disruption of this thioesterase 
5 reportedly caused a ten-fold drop in the amount of both 

laacrolides produced. A similar finding has been reported 
for the discrete thioesterase of the tylosin PKS gene 
cluster (Cundliffe E. et al. Chemistry & Biology in 
press) . Additional copies of such thioesterases may 

10 therefore accelerate the production of specific 

polyketide, but this has not yet been demonstrated. 
However, the presence of the discrete thioesterase is not 
completely essential for polyketide production. 

It is highly desirable to have a broadly effective 

15 method of catalysing the release of polyketide gene 

products from a PKS as the free acid. The well-studied 
integral thioesterase domain in the erythromycin PKS 
thioesterase has a broad specificity in cyclization to 
form a lactone {assuming that a hydroxy group is present 

20 in the growing polyketide chain at an appropriate 

-position) ^ but hydrolysis -to -form the -free acid is very 
slow. The recognition of the unusual arrangement of the 
monensin PKS means that it is now possible to harness 
either the entire PKS module that catalyses the twelfth 

25 and final extension cycle in monensin biosynthesis, or the 
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C-terminal portion of it, and graft it onto a different 
polyketide synthase by genetic engineering, so as to allow 
the release mechanism characteristic of monensin to 
operate in a different context. The use of this portion 
only of the monensin PKS suffices to allow the novel 
mechanism of chain release to operate successfully. The 
speed of the polyketide chain hydrolysis in a given case 
can depend on the additional presence of one or both of 
the discrete thioesterase genes (monAIX and monAX) from 
the monensin gene cluster. The use of this novel method of 
chain termination represents a valuable way of generating 
a large number of novel engineered polyketides that are 
currently inaccessible, and ensuring that the products 
have a specified chain length. 

The genes monBI and monBII appear to encode very 
similar enzymes with significant amino acid sequence 
similarity to authentic ketosteroid isomerases which are 
known to catalyse the migration of an activated carbon- 
carbon double bond- The conservation of active site 
residues makes it very likely that these inon genes govern 
a reaction involving -activated double -bonds .dn. ±he 
biosynthetic pathway to monensin and this surprising 
observation can be accommodated if the initial product of 
the polyketide chain growth on the monensin PKS is a 
linear precursor in which the double bonds were initially 
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formed with a conventional trans or E (entgegen) geometry; 
but before the polyketide chain was extended by insertion 
of the next unit the laonBI and/or the monBIJ gene 
product (s) catalyse the specific rearrangement of the 
5 newly-created double bond into the cis or Z (zusammen) 

geometry. This new view of the monensin biosynthetic 
pathway allows the deduction that the monBI and monBII 
genes, perhaps in combination with specific portions of 
the monensin modules where they normally exert their 

10 effects (namely modules 3, 5 and 7) might be used in order 

to achieve the extremely desirable targetted biosynthesis 
of novel poly ket ides containing double bonds with Z 
geometry at specified point (s) along the chain. Thus for 
example it should be possible to provide for the direct 

15 biosynthesis of C22-C23 cis or Z double bond in 

avermectins, thus avoiding tedious and expensive chemical 
conversion of an initial fermentation product into this 
important anthelminthic • Only limited experimentation is 
needed to see whether the monBI and/or monBII gene 

20 products are sufficient or whether the mon PKS at modules 

3, 5 and 7 forms part of the specific docking site{s) for 
the isomerases and therefore must also be used in the 
creation of the hybrid PKS that will insert the cis or Z 
double bond at the desired position. The substrate 

25 specificity of the isomerases need not be limited to 2,3- 
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unsaturated thioesters. The purified enzymes could also be 
used to effect such isomerisations In vitro , depending on 
the position of the equilibrium or whether further enzymes 
are used to achieve the further transformation of the 
5 product as it is formed (vide infra) . 

The product of the monCI gene is a novel oxidative 
enzyme with some sequence similarity to authentic examples 
of such enzymes in the databases; and with a clearly 
definable role in the monensin biosynthetic pathway, the 

10 epoxidation of the double bonds at three separate 

positions in the initially-formed acyclic intermediate in 
monensin biosynthesis. This epoxidase could therefore be 
used in conjunction with monBI/monBII gene products to 
effect oxidative reactions on suitable substrates in vitro 

15 and in vivo. Similarly the monCII gene product is a 

putative cyclase that opens the epoxides and causes the 
formation of ether rings in monensin. 

Any or all of the monBI^ monBII^ monCI or monCII 
genes may be introduced into a heterologous strain 

20 containing the gene cluster for another polyether, in 

order to divert the biosynthetic pathway and produce a 
polyketide of altered structure- In these experiments the 
analogues of these jnonB genes could either be present or 
(once located and characterised using the man genes as 

25 probes) they may be deleted prior to the introduction of 
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the monB and monC genes into that strain. The converse 
experiment in which analogues of the monB and inonC genes 
from other strains are introduced into S. cinnamonensis 
likewise has the potential to produce novel oxidised 
5 polyketides. Also, the monB and monC genes or- their 

analogues may be introduced into a strain that normally 
produces a macrolide or a polyene or some other complex 
polyketide and expressed there, when they may effect the 
diversion of the growing polyketide chain on a 
10 heterologous modular PKS towards a new product, which may 

or may not have the structure of a polyether. 

The availability of the monensin gene sequence allows 
the institution of domain swaps to alter the 

15 acyltransferase (AT) specificity of a given module, for 

example the ethylmalonyl-CoA specific extender found in 
one of the modules of the monensin PKS can be used to 
replace one of the other ATs to generate an ethyl side 
branch at that position in the chain, or the AT can be 

20 used to substitute in any other (e.g. macrolide) PKS, as 

described in- WG ^-98 A015.71 -^and :.W0 -98/0154 6.. .Similarly the 
alteration of the level of reduction in a module, by 
manipulation of the reductive enzymes, can be applied to 
the monensin genes and here it will produce, depending on 

25 which module is affected, either an altered monensin, or a 
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species which is only partly cyclised, or a polyether with 
an altered pattern of cyclisation, or even a linear 
polyketide. 

In general the targetted alteration of the pattern of 
substitution of sidechains or reduction level- along the 
polyketide chain produced by the monensin PKS will, like 
the disruption or deletion of the oxidative enzymes 
mentioned above, lead to non-polyether polyketide 
products. It should be possible, by introduction of the 
DEBS thioesterase at the C-terminus of one of the later 
modules of the monensin PKS, together with an 
appropriately placed hydroxy group earlier in the chain, 
to produce novel macrolide products from this polyether 
PKS system, or alternatively novel polyenes of defined 
chain length and chosen ring size* 
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Example 1 

Cloning of the monensin A biosvnthetic gene cluster using 
DNA probes derived from the erythromycin-producing 
polvketide synthase of SaccharoTDolyspora ervthraea 
5 A genomic library of the monensin A producing strain 

Streptomyces cinnamonensxs ATCC 15413 was constructed 
using methods well-known in the art, namely, the 
production of high molecular weight genomic DNA, followed 
by the partial cleavage of this DNA using the frequent- 

10 cutting restriction enzyme Sau3A, fractionation of the 

fragments on a sucrose gradient and selection of fragments 
of average size 35-40 kbp, and the cloning of these 
fragments into the cosmid vector pWElS (Evans, G,A. et al. 
Gene (1989) 79:9-20) which had been previously digested 

15 with BamEl and treated with shrimp alkaline phosphatase. 

The library was packaged and transfected into Escherichia 
coli XL-1 Blue MR cells. The library was plated out on 
2xTY agar medium (10 g tryptone, 10 g yeast extract, 5 g 
NaCl, 15 g bactoagar per litre containing ampicillin 50 

20 //g/ml) for cosmid selection and the colonies were allowed 

to grow overnight. 'The library was then screened by 
hybridisation using as a probe DNA encoding the 
ketosynthase domain of module 1 of the erythromycin- 
producing PKS (6-deoxyerythronolide B synthase, DEBS) of 

25 Saccharopolyspora erythraea. The colonies giving a 
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positive hybridisation signal in the hybridisation were 
selected and the cosmid DNA from each colony was purified 
and mapped by restriction digestion. The presence of the 
target biosynthetic genes on a cosmid was verified by 
5 sequencing of the ends of the cosmid inserts ^sing the 

commercially available T3 and T7 primers which hybridise 
specifically to the respective ends of each cosmid insert 
(Evans, G.A. et al. Gene (1989) 79:9-20). 
Example 2 

10 Sequencing of the biosvnthetic gene cluster for monensin A 

from StreTDtomvces cinnamonensis 

Three cosmids obtained by screening of the genomic 
library of S. cinnamonensis were used to obtain the entire 
DNA sequence of the monensin biosynthetic gene cluster. 

15 These cosmids, MO.CN02, MO.CNll and MO.CN33 between them 

contain the entire DNA sequence of the cluster and the 
adjacent regions of the chromosome- They have been 
deposited in NCIMB, 23 St Machair Drive, Aberdeen AB24 
3RY, UK, under the NCIMB accession numbers 40956 

20 (MO-CNll); 40957 (MO-CN33) and 40958 (MO-CN02) 

respectively* . . , 

The DNA of each cosmid was separately subjected to 
partial digestion with SauSA and fragments of 
approximately 1.5-2.0 kbp were separated by agarose gel 

25 electrophoresis. The fragments were then ligated into the 
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Plas^id vector pUC18 (Messing, 1982), previously digested 
with Ba.HI and treated with shrimp alkaline phosphatase. 
The library was transformed into E. aoli strain XLl-Blue 
MR and plated on 2xTY agar medium containing ampicillin 
5 (100 pg/ml) to select for plasmid-containing ^ells. 

Plasmid DNA was purified from individual colonies and 
sequenced using the Sanger dye-terminator procedure on an 
ABI 377 automated sequencer (Sanger, F. Science (1981) 
214:1205-1210). The sequence data obtained from single 
random subclones of a cosmid was assembled into a single 
continuous sequence and edited using GAP4.1 program of the 
STADEN gene analysis package (Staden, R. Molecular 
Biotechnology (1996) 5:233-241). 

The sequence is set out in the appended sequence 
15 listing. 

Tables I and II contain data about individual genes 
and gene products. 
Example 

Inact.ivat1on of tho m^nm ^ , , 

A chromosomal gene disruption experiment was used to 
verify the.^dentity..of .the cloned polyketide synthase gene 
cluster. Plasmid pMOB63l4 is a pUC18 sequencing subclone 
Of the presumed monensin A biosynthetic gene cluster 
prepared as described in Example 1, whose inserted DNA 
comprises the DNA sequence from nucleotide 97 63 to 
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nucleotide 10108 in SEQ ID 1, and which therefore contains 
a region of DNA wholly internal to orfE, a putative 3-0- 
methyltransf erase. A Hindlll fragment containing the 
thiostrepton resistance gene tsr from plasmid pIJ702 
5 (Katz, E. at al. J. Gen. Microbiol. (1983) 129:2703-2714) 

was cloned into the Hindlll site of plasmid pMOB6314 and 
the ligation mixture was used to transform £. coll cells. 
Transf ormants bearing the required plasmid pMOAEOl were 
identified by isolation of plasmid DNA and analysis by 

10 restriction digestion. pMOAEOl. Plasmid pMOAEOl was used 

to transform protoplasts of Streptomyces cinnamonensis as 
described by (Hopwood D.A. et al. (1985)). Since plasmid 
pMOAEOl lacks an origin of replication that is active in 
Streptomyces f growth in the presence of thiostrepton (25 

15 pg/ml) in the regeneration medium led to the isolation of 

stable integrants. Isolated putative integrants were 
tested for the presence of integrated pMOAEOl sequences by 
Southern hybridisation. A clone of Streptomyces 
cinnamonensis identified by its restriction pattern in 

20 Southern hybridisation as bearing pMOAEOl integrated in 

the region of monE of the monensin A biosynthetic gene 
cluster was designated S. cinnamonensis MO-DDOl. 

Detection of production of the monensin A related 
metabolites produced by S. cinnamonensis MO-DDOl was 

25 performed by GC-MS analysis of methanol extracts of the 
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entire broth harvested in 72 hours of growth of the 
strain. No significant amounts of monensin A-related 
metabolite production were detectable. 
Example 4 

5 Overproduction of erythromycin aqlvcone in SfreDtomyces 

cinnamonensis 

S. cinnamonensis is a suitable system for 
overproduction not just of monensin A but also of other 
polyketide metabolites. Established techniques of genetic 

10 transformation allow fast introduction of foreign 

polyketide producing genes sets into this host. Fast 
growth of S. cinnamonensis in liquid culture and optimal 
precursor supply favour high yield of polyketide 
metabolites. 

15 Construction of pIBOSl 

S. erythraea NRRL2338 was transformed with pCJR30 
(Rowe, C. J., et al. (1998) Gene 216:215-223) using a 
routine protoplast transformation technique as described 
by Hopwood et al. (1985). A stable integrant of S. 

20 erythraea [pCJR30] was identified and the production of 

lOmg/L of the triketide lactone (delta lactone of 
(2S, 3R, 4R, 5R) -2, 4-dimethyl-3, 5-dihydroxy-heptanoic acid) 
in addition to erythromycins was confirmed by MS 
analysis . 

25 Total DNA of S. erythraea [pCJR30] was purified and 
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approximately 200 ng was digested with EcoRl endonuclease. 
The digestion mixture was precipitated with isopropanol 
and the resulting DNA was treated with T4 DNA-ligase for 
16 hours at 16*^C. The ligation mixture was used to 
5 transform E.coll DHIOB cells. The trans forma'nts were 

screened for the presence of the plasmid. A clone 
containing a 44. 7kb plasmid was identified and confirmed 
by restriction analysis to contain three complete genes: 
eryAI, eryAII and eryAIII. The plasmid was named pIB061. 

10 Transformation of 5. clnnamonensis 

Protoplasts of S, clnnamonensis were prepared by a 
modified procedure of Hopwood et al. (1985) . Plasmid 
pIBOei was transformed into the protoplasts of S. 
clnnamonensis and stable thiostrepton resistant colonies 

15 were isolated. Individual colonies were checked for their 

plasmid content and the presence of plasmid pIB061 was 
confirmed by its restriction pattern. S. clnnamonensis 
(pIB061) was inoculated into 250 ml of M-C3 minimal 
production medium containing 10 /zg/ml of thiostrepton and 

20 allowed to grow for 72 hours at 30 °C. After this time the 

mycelia were removed by filtering. The broth was extracted 
with two volumes of ethyl acetate and the combined ethyl 
acetate extracts were washed with an equal volume of 
saturated sodium chloride, dried over anhydrous sodium 

25 sulphate, and the ethyl acetate was removed under reduced 
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pressure to give about 200 mg of crude product. The 
product was analysed by LCQ and mass was confirmed to that 
of erythronolide B. 

This example demonstrates the importance of S. 
5 cinnamonensis for production of high levels of foreign 

polyketide antibiotics. Introduction of the complete 
erythromycin gene cluster or other gene clusters into this 
system are likely to produce high levels of the 
corresponding metabolites - 

10 Example 5 

Construction of plasmid pCJW58 containing the monensin 
activator gene under the ermE* promoter 

The ermE* promoter derived from the ermE resistance 
methyltransf erase gene of S. erythraea (Bibb at aJ. Gene 

15 (1985) 38:215-226) was amplified by PGR as a Spel-Xbal 

fragment using the following oligonucleotides 
5'-CCACTAGTATGCATGCGAGTGTCCGTTCGAGT-3' and 5'- 
TTGTATACACCTAGGATGGTTGGCCGTGC-3^ with pRH3 (Dhillon et al. 
Molecular Microbiology (1989) 3:1405-1414 as a template 

20 and cloned into Smal-digested, phosphatase-treated pUClB, 

to produce plasmid pIB135- The integrative plasmid pSET152 
(Bierman, M. et al. (1992) Gene 116:43-49)) was digested 
with Xbal and the backbone was dephosphorylated and 
ligated to the Spel-Xbal fragment of pIB135 containing the 

25 ermE'*' promoter. The ligation mixture was used to 
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transform E. coli DHIOB and the orientation of the insert 
in the plasmids from individual clones was checked by 
using restriction analysis, A plasinid with the eriHE* 
promoter oriented so that the Ndel and Ajbal sites are 
5 adjacent to the apramycin resistance gene was- selected and 

named pIB139. 

The monR gene from the monensin biosynthetic gene 
cluster was amplified and Ndel and Xhal restriction sites 
introduced at 5' and 3' ends respectively, by PGR using as 

10 primers the following oligonucleotides: 

5'-AGA TAG CAT ATG GTG GGC CCG CTC CGC AT -3' 
and 5'-AAT GCT CTA GAG TGT GAG GGA CGG GAG AGG GGG AA-3' 
and cosmid MO.CNll as template. The PGR product was 
ligated into Smal-treated and phosphatase-treated plasmid 

15 pUG18 and the ligation mixture was used to transform E, 

coli DHIOB cells. Transformant colonies were analysed for 
the presence of plasmid and the identity of the plasmid 
inserts was verified by sequencing. A plasmid whose 
insert contained the monR gene flanked by Ndel and Xjbal 

20 restriction sites was selected and designated pGJW57 . 

Plasmid pCJlAfST-T^as-diges^t-ed with Ndel • and Xbal and 
the fragment containing the monR gene was ligated together 
with the backbone of plasmid pIB139 which had been 
digested with the same two restriction enzymes, and 

25 purified by gel elution. The ligation mixture was used to 
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transform E. coli strain DHIOB cells. Transf ormant 
colonies were analysed for the presence of plasraid and the 
identity of the plasmid inserts was verified by 
restriction analysis. One such recombinant was selected 
5 and named plasmid pCJW58 . 

Plasmid pCJW58 was used to transform the methylation- 
deficient E. coli strain ET 12567 (MacNeil D. J. et al. 
(1992) Gene 111:61-68) and the recovered, unmethylated 
plasmid was then used to transform the same E. coli strain 

10 ET12567 housing the plasmid pUB307, a derivative of RP4 

which is mob" and which contains a gene for kanamycin 
resistance (Piffaretti, J. C. et al. (1988) Mol. Gen. 
Genet. 212:215-218). Recombinants were plated on 2 x TY 
agar medium containing apramycin and kanamycin at final 

15 concentrations of 50 micrograms per ml and 50 micrograms 

per ml respectively. The plasmid content of recombinants 
was checked isolation of plasmid DNA and checking of the 
identity of these plasmids by restriction analysis. One 
such clone which contained both pUB307 and plasmid pCJW58 

20 was selected and used for further experiments. 

Construction of Streptomyces cinnamonensis (pCJW58) 
and production of monensins 

A single colony of E. coli ET12567 housing both 
pUB307 and pCJW58 was toothpicked into 3 ml of TY liquid 

25 medium, containing apramycin and kanamycin at 25 and 25 
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micrograms respectively, and grown overnight at 37 ""C- This 
culture was used to inoculate 25 ml of TY medium, 
supplemented with the same antibiotics at the same 
concentrations, and growth was continued until the 
5 absorbance at 600 nm (1 cm pathlength) was between 0.3- 

0.6. The cells were centrifuged (room temperature, 7 
minutes, 2000 x g) , resuspended in TY liquid medium (10 
ml) containing no added antibiotics, re-centrifuged as 
before, then resuspended in 2ml of TSB medium and placed 

10 on ice. Meanwhile, 0.5 ml of TSB medium was added to 100 

microL containing approximately 10® spores of 5. 
cinnamonensls , After a brief heat shock, at 50°C for 10 
minutes, the suspension was briefly cooled, mixed with 
0.5 ml of donor E. coll cells, and plated on solid A 

15 medixom, which has composition as follows: 

A medium 



Sigma wheat starch 5g 

Corn steep powder 1.25g 

20 Yeast extract 1.5g 

CaCOa 1 .^5g 

FeSO^ 6 mg 

DIFCO agar lOg 

H2O to 500 ml 



25 pH adjusted to pH 7 with KOH. 
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And to which in addition was added 10 mM MgCls to a 
final concentration of 10 mM- 

The plates were allowed to dry overnight at room 
5 temperature, and were then allowed to incubatB a further 
18 hours at 30°C. After this time each 25 ml plate was 
overlaid with a solution of apramycin (final concentration 
50 micrograms per ml) and nalidixic acid (final 
concentration 20 micrograms per ml) , and the plates were 

10 allowed to incubate for four days at 30°C. At this time 

individual colonies were toothpicked onto solid A medium 
and allowed to grow. Four representative colonies from 
the A medium plate were grown up in liquid modified YEME 
medium, which has composition as follows: 

15 Modified YEME medium 

Sucrose lOOg 
DIFCO Yeast extract 3g 
Bacto peptone 5g 
Oxoid Malt extract 3g 

20 Glucose lOg 

H2O to IL 

pH adjusted to pH 7.2 with NaOH. 

These cultures were used to provide a 2% vol/vol 
inoculum for 30 ml of modified YEME which was grown for 7 
25 days, and then transferred to SM16 medium, which has 
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composition as follows: 
SMI 6 medium 



3- [N-Morpholino] -propane sulfonic acid 
5 (MOPS) buffer -20. 9g 

L-proline 10. Og 

Glucose 20g 
NaCl 0 . 5g 

K2HPO4 2 • Ig 

10 Ethylenediaminetetraacetic acid, sodium 

salt 0.25g 
MgSO^.VHsO 0.49g 
CaCl2.2H20 0.029g 
Trace elements solution (Hopwood, 

15 D. A. et al. (1985) Genetic Manipulation 

of Streptomyces - a Laboratory Manual, 
at p-235) 2 ml 

0.5 M C0CI2 solution 2 microlitres 

H2O to IL 

20 pH adjusted to pH 7 with NaOH, 

After -growth \Cor a fuxther 7- days;*-myc'eiium was 
collected by centrifugation at 2000 x g for 30 minutes ^ 
and the supernatant was extracted three times with 300 ml 
of ethyl acetate. The combined extracts were concentrated 

25 by evaporation under reduced pressure to an oil, which was 
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mixed with 1 ml of methanol. Samples were applied to an 
LCQ liquid chromatograph fitted with a mass spectrometer 
detector unit. The coliamn used was a C18 reversed phase 
colxmn, equilibrated with a mixture of 80% 20mM ammonium 
5 acetate/20% acetonitrile, and the column was "eluted with a 

gradient of increasing acetonitrile, reaching 100% 
acetonitrile over 24 minutes, Monensins A and B emerged 
from the coliomn with retention times respectively of 8.2 
minutes and 9.2 minutes. The relative amounts of monensin 

10 produced by three independent clones (A-C) containing an 

additional copy of the monR gene were compared to a 
control fermentation of the wild type 5. cinnamonensis 
strain, with the results shown in the Table below: 
Table showing increased monensin production in strains 

15 bearing additional copy of monR gene 

Strain monensin A monensin B 

concentration concentration 
(arbitrary units) (arbitrary units) 
Control 188 861 

20 A 430 1 800 

B 450 -l -300 

C 249 1 300 

Example 6 

Construction of cinnamonensis M12AT5 
25 A region lying immediately 5' of the DNA encoding the 
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acyltransf erase (AT12) domain of module 12 of the monensin 
polyketide synthase in the monensin biosynthetic gene 
cluster was amplified with the following primers: 5'- 
GGTGGCCACGGAAACACCAACACCGGACCCGCGCC-3' , and 5'- 
5 CTCTCGGAGGCCCGGCGCAACGGCCACAA~3' , 3' using cosmid MO-CNll 

as a template. The PGR product was ligated into 5inal 
digested and phosphatase-treated plasmid pOC18 and the 
ligation mixture was used to transform E. coli DHIOB 
cells. Transformant colonies were analysed for the 

10 presence of plasmid and the identity of the plasmid 

inserts was verified by sequencing. A plasmid whose 
insert contained a fragment upstream of the AT12-encoding 
sequence from about 82.3kb to 83.2kb of the mon cluster 
was designated pM081, Similarly a region lying immediately 

15 3' of the DMA encoding the acyltransf erase (AT12) domain 

of module 12 of the monensin polyketide synthase in the 
monensin biosynthetic gene cluster was amplified with the 
following primers: 5 ' -GGCCTAGGGCTGCCTCGGGTGGTGGATCTGCCGA- 
3' and 5'- TGGTCGGGCGCGGTGCGTGCGATACGT-3' , using cosmid 

20 MO-CNll as a template. The PGR product was ligated into 

, , Sinal-treated -and--dephosph©r*yl-ated -pDClS and the ligation 
mixture was used to transform DHIOB E.coli cells. 
Transformant colonies were analysed for the presence of 
plasmid and the identity of the plasmid inserts was 

25 verified by sequencing. A plasmid whose insert contained 
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a fragment downstream of the AT12-encoding sequence, from 
80.5kb to 81.4kb of the mon cluster, was designated pM082, 

The DNA encoding AT of module 5 was amplified and 
Mscl and Avrll restriction enzyme recognition sites were 
5 introduced at the ends by PGR using the following primers: 

5'-CCTGGCCAGGGCGGCCAGTGGGTGGGCATG-3' and 5'- 
GGCCTAGGGGTCGGCCGGGAACCAGCGCCGCCAGT-3' and the cosmid MO- 
CN33 as a template. The PGR product was ligated into SmaX- 
treated and dephosphorylated pUC18 and the ligation 

10 mixture was used to transform DHIOB E.coli cells. 

Transfoirmant colonies were analysed for the presence of 
plasmid and the identity of the plasmid inserts was 
verified by sequencing- A plasmid whose insert DNA, with 
sequence from about 44.2kb to 45.2)cb of the mon cluster, 

15 encoded the ATS domain was designated pM083. 

pMOBl was digested with Mscl and ifindlll and ligated 
to the 0.9kb Mscl - Hindlll fragment of pM082. A clone 
containing both fragments was designated pM084. Plasmid 
pM084 was cleaved with Avrll and Hindlll, treated with 

20 phosphatase, and ligated together with the 1,0 kb Avrll - 

Hindlll fragment of piyiOBS to produce pMOBB, which contains 
the DNA encoding the ATS domain flanked by DNA from either 
side of the DNA encoding the AT12 domain of the monensin 
PKS. The thiostrepton resistance gene tsr, derived from 

25 plasmid pIJ702 {Katz, E. et al., J- Gen. Microbiol. 
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1983), was cloned into the ifindlll site of pM085. The 
resulting plasmid pM08 6 was analysed by its restriction 
pattern and confirmed to contain all the desired 
elements • 

5 Plasmid pM086 was used to transform S. ainnamonensis 

protoplasts as described by Hopwood, D, A. (1985) . Stable 
thiostrepton-resistant transf ormants were isolated and 
checked for the desired integration of the pM085 into the 
AT12 flanking regions by Southern blot hybridisation. One 

10 such integrant, S. clnnamonensis MO-08, containing pM085 

integrated upstream of the AT12, was passed through 4 
cycles of sporulation on a non-selective nutrient 
medium. Spores obtained after the fourth cycle were 
replica-plated onto media with and without thiostrepton. 

15 DNA of clones that had lost thiostrepton resistance was 

analysed by Southern blot hybridisation. Clones in which 
the DNA encoding the AT12 domain had been replace by the 
DNA encoding the ATS domain was designated 5. 
cinnamonensls M12-AT5. At this time individual colonies 

20 were toothpicked onto solid A medium and allowed to grow. 

Four -representative ^eolonies from the A -medium plate were 
grown up in liquid modified YEME medium, which has 
composition as follows: 
Modified YEME medium 
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Sucrose 

lOOg 

DIFCO Yeast extract 3g 

Bacto peptone ^ 

5g 

Oxoid Malt extract 3g 
Glucose 

lOg 

H2O to IL 



pH adjusted to nH 7 o 

" to pH 7.2 with NaOH. 

-he.e culture, we.e usee, to p.ov..e a 2. 

composition as follows: 
SMlfi. insdiugj 



3- [N-Morpholino] -propane sulfonic 
acid (MOPS) buffer 

L-proline 

Glucose 

NaCl 

K2HP0, 

Ethylenediaminetetraacetic acid, 
sodium salt 

MgSO, . iR^o 
CaCl2.2H20 

Trace elements solution (Hopwood, 
A. et al, a985) Genetic 



20. 9g 
10. Og 
20g 
0.5g 
2.1g 

0.25g 
0.49g 
0.029g 
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Manipulation of Streptomyces - a 

Laboratory Manual, at p. 235) 2 ml 

0-5 M C0CI2 solution 2 microlitres 

H2O to IL 

5 pH adjusted to pH 7 with NaOH. 

After growth for a further 7 days, mycelium was 
collected by centrifugation at 2000 x g for 30 minutes, 
and the supernatant was extracted three times with 300 ml 
of ethyl acetate. To confirm presence of the C-2-ethyl 

10 substituents of both monensin A and B the combined 

extracts were concentrated by evaporation under reduced 
pressure to an oil, which was mixed with 1 ml of methanol. 
Samples were applied to an LCQ liquid chromatograph fitted 
with a mass spectrometer detector unit. The column used 

15 was a C18 reversed phase column, equilibrated with a 

mixture of 80% 20mM ammonium acetate/20% acetonitrile, and 
the column was eluted with a gradient of increasing 
acetonitrile, reaching 100% acetonitrile over 24 minutes. 
Mass ions 14 mass units above those expected for both 

20 monensin A and B confirmed production of the respective C- 

2-^ethYl 'substituents . 

Example 7 . Construction of pSGKOOS and its use in the 
production of C-13 propyl-erythromycin 

Plasmid pSGKOOS is a pCJR24 based plasmid containing 
25 a PKS gene comprising a loading module plus the first and 
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second extension modules and the chain terminating 
thioesterase of the PKS responsible for the production of 
erythromycin (DEBS) . The loading module comprises the KS 
and ethyl -malonyl CoA specific AT from module 5 of the 
5 monensin PKS linked to the DEBS loading ACP domain. In 

addition, the active site cysteine of this module 5 KS has 
been mutated to glutamine to convert an extender di-domain 
to a loading di-domain. Plasmid pSGKOOS was constructed 
as follows. 

10 A 2769bp DNA segment of the monensin cluster of S. 

cinnamonensis extending from nucleotide 42438 to 45207 was 
amplified by PGR using the following oligonucleotide 
primers, 5' -GTGACGTCATATGTCGAGTGCTGAAGAGTCG-3' and 
5 ' -GGGGTCGCCTAGGAACCAGCGCCGCCAGTCGA-3' 

15 The design of these primers introduced Nde I and Avr 

II sites at the ends of the amplifed fragment. Monensin 
cosmid 05 was used as a template for the reaction. The 
resulting 2769bp fragment was digested with Nde I and Xho 
I and a 656bp fragment (Fragment A) purified by 

20 preparative gel electrophoresis. 

A second .ECR„reac.tJLon.Jw.as .used .twith .the .same template 
to amplify DNA from nucleotide 43098 to 45207. The 
primers used were 

5 ' -CGGCCTCGAGGGCCCGTCGGTCAGTGTCGACACGGCGCAGTCCTCCTCGC-3 ' 
25 and 5' -GGGGTCGCCTAGGAACCAGCGCCGCCAGTCGA-3' 
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The design of the upstream oligonucleotide primer 
incorporated a change of the codon specifying the KS 
active site cysteine (nucleotides 43135-43137, TGC) to 
glutamine (CAG) . The resulting 2109bp DNA fragment 
5 (Fragment B) was digested with Xho I and Avr -^11 and 

purified by preparative gel electrophoresis. 

Plasmid pCJWSO is derived from pCJR24 and DEBSl-TE in 
which Msc I and Avr II sites have been introduced to flank 
the AT of the DEBS loading module. This plasmid was 
10 digested with Nde I and Avr II and the larger fragment 

(Fragment C) purified by preparative gel electrophoresis. 

The three fragments (Fragments A, B, C) were ligated 
together using T4 DNA ligase and the ligation mixture used 
to transform electrocompetent E. colx DHIOB cells. 
15 Individual clones were checked for the presence of the 

desired plasmid pSGKOOS. The identity of pSGKOOS was 
confirmed by restriction pattern and sequence analysis. 

Plasmid pSGKOOS was used to transform S. erythraea 
NRRL2338 using a routine protoplast transformation 
20 technique. Thiostrepton resistant colonies were selected 

on R2T20 media containing g/ml thiostrepton. Further 
analysis confirmed that pSGKOOS had integrated into the 5. 
erythraea NRRL2338 chromosome by Southern blot 
hybridisation of their genomic DNA with DIG-labelled DNA 
25 containing the actll orf4 promoter. The culture 5, 
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10 



erythraea NRRL2338 (pSGK005) was inoculated into 5ml tap 
water medium in a 30ml flask. After three days 
incubation at 29°C this flask was used to inoculate 30ml of 
Ery-P medium in a 300ml flask. The broth was incubated at 
29°C at 200rpm for 6 days. After this time the whole broth 
was adjusted to pH8.5 with NaOH, and then extracted twice 
with an equal volume of ethyl acetate. The ethyl acetate 
extract was evaporated to dryness at 45°C under a nitrogen 
stream using a Zytnark Turbovap LV evaporator. The product 
identities were confirmed by LC/MS. A peak was observed 
with a m/z value of 734 (M+H)- required for erythromycin A. 
A second peak was observed with a m/z value of 748 (M+H)\ 
required for 13-propyl erythromycin A. 



15 
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TABLE I 



gene 


function 


start 


end 


gdhA 


glutamate dehydrogenase (partial) 


1038 


0 


dapA 


dlhydrodipicolinate synthase 


2140 


1220 


orf3 


putative transcriptional activator 


2211 


3152 


orf4 


hypothetical protein 


3264 


3680 


orfS 


hypothetical protein 


4307 


3684 


ori6 


hypothetical protein 


4570 


4758 


orfZ 


hypothetical protein 


5058 


5612 


acpX 


acyl carrier protein 


6010 


5693 


ksX 


ketoacyl synthase 


8531 


6045 


monCI 


probable epoxihydroiase/cydase 


9542 


8643 


monE 


methyltransferase 


10426 


9596 


monT 


monensin resistance gene (ABC- 


10656 


12191 


monRI 


probable repressor 


12205 


12780 


monAI 


thfoesterase 


13829 


13023 


monAi 


polyketide synthase loading & 


14121 


23198 




KS^L 


14172 


15486 




AT-L malonate specific 


15777 


16880 




ACP-L 


17019 


17276 




KS1 


17358 


18626 




AT1 methylmalonate specific 


18960 


19976 




DH1 (potential) 


20019 


20519 




KR1 (inactive) 


21636 


22241 




ACP1 


22536 


22793 


monAI 


polyketide synthase module 2 


23205 


29921 




KS2 


23307 


24569 




AT2 methylmalonate specific 


24891 


25913 




DH2 


25953 


26369 




ER2 . 


27600 


28463 




KR2 


28485 


29042 




ACP2 


29313 


29570 


HDonAI 


polyketide synthase modules 3 & 4 


29974 


42372 




KS3 


30076 


31347 




AT3 malonate specific 


31798 


32838 




DH3 


32884 


33465 




KR3 


34692 


35181 




ACP3 


35553 


35811 




KS4 


35899 


37170 




AT4 methylmalonate specific 


37489 


38511 




DH4 


38557 


38982 




ER4 


40123 


40986 




KR4 


41005 


41562 




ACP4 


41848 


42105 


monAI 


polyketide synthase module's 5 & 6 


42448 


54564 




KS5 


42623 


43890 




ATS^thylmalonate specific 


44221 


45243 




DH5 


45289 


45744 




KR5 


46785 


47337 




ACP6 


47593 


47850 
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KS6 


47947 


49218 




AT6 malonate SDBCific 


49579 


50601 




DH6 


50644 


51075 




PRR 
u_rs.o 


52222 


53102 




KRfi 
rvrxo 


53101 


53661 






54052 


54306 


rnonA 


nnK/l^'ottH^ QvnthsQP mnHii[pQ 7 R 


54614 


66934 






54716 


55978 




ATT m*ath\/!mfllnnsitf» Qnpoifif; 
/A 1 ( 1 1 iCfU lyii 1 lafui Idle o^coiiiu 


56300 


57319 




UTI/ 


57358 


57802 






59048 


59608 






59S67 


60124 




KS8 


60185 


61453 




ATfi malArtptp snpcific 


61808 


62839 




DH8 


62882 


63316 




ER8 


64577 


65437 




KR8 


65456 


66016 




ACP8 


66404 


66661 


monA 


polyketide synthase module 9 


66952 


72054 




KS9 


67075 


68340 




ATQ m a Innate soecific 


68698 


69729 




KR9 ^Dotential) 


70735 


71262 






71536 


71783 




nmh^^hlp rpnulator 


72051 


74993 


monGI 


CAO /*on+sjininr! on/iYfHsiQP 


76541 


75051 


mon&l 


UUUUIO \J\Jl IVJ lOVfl 1 Id oo^ 


76960 


76538 




rini iKIp hAnrI i<Dmpra<%P 


77450 


77016 


monA 

1 1 Iwl Mr\ 


nolvketide svnthase modules 1 1 & 


88708 


77447 




IXO 1 1 


88612 


87344 




AT1 1 mpthvlmalonate soecific 


87022 


85993 




l\.r\, 1 1 


85111 


84562 






84292 


84035 




r\o i£. 


83962 


82694 






82354 


81335 




nH12 ^DQtentiaH delta 


81286 


80855 




PPi9 ^nr^fpnti^i^n 
Crs. 1 ^ ^puicl lUal ) 


79618 


78914 






78895 


78337 




ACP12 


78070 


77812 


monA 


polyketide synthase module 10 


93741 


88816 




KS10 


93636 


92368 




AT10 methylmalonate specific 


92040 


91021 




KR10 


90132 


89584 




ACP10 


89322 


89068 


1 1 IKJl IL/ 




94081 


95273 


monRf 


probable activator 


96141 


95338 


monA 


-thioesterase 


96941 


96138 


orf29 


cell wall biosynthesis capK 


97580 


98953 


lipB 


lipase B 


99983 


98991 


orf31 


ion pump 


101433 


100507 


orf32 


membrane structural protein 


102581 


101490 


amtA 


glycine amidinotransferase 


102924 


103450 
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TABLE n 

GdhA, glutamate dehydrogenase (partial coding sequence) Length: 346 
amino acids 

1 LTTRPDTKTA LSQKTALSQL LTEIEHRNPA QPEFHQAARE VLETIiAPVIA 
51 ARPEYAEAGL lERLCEPERQ IVFRVPWQDD HGRVRVNRGF RVEFNSALGP 

ir 

101 YKGGLRFHPS VNLGVIKFLG FEQIFKNALT GLGIGGGKGG SDFDPRGRSD 
151 AEVMRFCQSF MTELYRHIGE^HTDVPAGDIG VGGREIGYLF GQYRRITNRW 
201 EAGVLTGKGR NP^GGSLIRPE ATGYGNVLFA AAMLRERGET LEGRTAWSG 
251 SGNVAIYTIQ KLAALGANAV TCSDSSGYW DEKGIDLDLL KQVKEVERAR 
301 VDTYAQRRGA SARFVPGRRV WEVPADIALP SATQNELDAD DATALI 

DapA, dihydrodopicolinate synthase Length: 307 amino acids 

1 MTLASSLEPT TEPLFUGLYV PLVTPFTDDL RLAPEALARL ADEALSAGAS 

51 GLVALGTTAE AATLTAEERE TVIRVCSAAG RAHGAPLIVG VGTNDTATAI 

101 TALRELAARG DVAAALVPAP PYIRPGEAGT IiAHFAALAEH GGLPLWYDI 

151 PYRTGQTLGA GTITALGRLP EWGIKHATG SIDPTTMELL DSPLPGFAVL 

201 GGDDIVLSPL VAAGAHGGIV ASANLRTADY AEMIALWRRG SAAPARALGA 

251 DIARLSAALF TEPNPTVIKG VLHAQNRIPS PAVRMPLLAA SADSVRRAAP 

301 LAASRK* 

ORF3, putative transcriptional activator protein Length: 314 amino acids 

1 MLDVRRIiHLL. RELDRRGTIA AVAEALTFTA SAVSQQLGVL EREAGVPLLE 

51 RSGRRWLTP AGRSLVAHAD AVLNRLEQAV AELAGARDGI GGPLRIGTFP 

101 SGGHTIVPGA LAELASRHPA LEPMVREIDS ARVSDGIiRAG ELDVALVHDY 

151 DFVPATPDTT VDEVPLLEEP MYLVTHAADT ATDSGSGSTL AALLGPCAEV 

201 PWITARDGTT GHAMAVRACQ AAGPQPRIRH QVNDFRTVLA LVAAGQGAGF 

251 VPRMAi^PSP AGWLTKZiPL FRRSKVAFRA GGGAHPAIAA FVAAATTAVE 
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301 RMAGSRGPAG GSE* 

ORF4, hypothetical protein Length: 139 amino acids 

1 MADDAYLFLL PDRHPRLGAA LAAVGALECT ETPAVHAWLQ AHEASVSSEQ 
51 VRILPADAET LIPKDAERLP VPLSEEEALK VEQECAPQTV TDMESELLAF 
101 RETTQDWQAL VHRALTAGIP AQRIARLTGL DPEEIGRL* 

ORFS, hypothetical protein Length: 208 amino acids 

1 LAVAACAAW LPIDAWRIS AADVGVLVFF AYLLPYLAIT MTVFVSVAPE 
51 QVRSWARREA RGTFLQRYVL GTAPGPGGSL FIAAAALWA VLWLPGHLST 
101 TFSALPRTLV ALALWAAWI CVWAFAVTF QADNLVENER ALEFPGERSP 
151 AWADYVYFAL AAMTTFGTTD VDVTSRDMRR TVAANTVIAF VFNTVTVAIL 
201 VSALGGR* 

ORF6, hypothetical protein Length: 63 amino acids 

1 MTVMDKLKQM LKGHEDKAGQ GIDKAGDFVD GKTQGKYSGQ VDTAQDKLRD 
51 QFGSDQQEPP QR* 

ORF7, hypothetical protein Length: 185 amino acids 

1 MGTAQSQEQA AAPGACAAFV RFVLCGGGVG LASSFAWAL ASWVPWALAN 

51 ALVAWSTW ATELHARFTF GAGGRATWRQ HAQSAGSAAA AYAVTCVAMF 

101 VLQQLVAAPG AVLEQWYLS ASAIAGVARF WLRLWFAR NRSLPAAAAV 

151 RTARPVRRVP APVPATVAHA ASRPAGPAAL CPAA* 

AcpX, acyl carrier protein (ACP) Length: 106 amino acids 

1 MTSTDHTSGQ DATELEKQIiA AATPEEREKL LTDTIRTQAG TLLNTTLSDD 
51 SNFLENGUSrS LTALELTKTL MTLTGMEIAM VAIVENPTPA QLAHHIiGQEL 
101 AHTTA* 
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KsX, 


ketoacyl-ACP synthase Length: 829 amino acids 




1 


VANEEKLVEY 


LKWTTAELHQ 


AQQQLRELKA AQHEPIAWS 


MACRLPGKTR 


51 


TPDDLWDLVS 


EGRDAVTGFP 


DDRAWELPEE 


RPYAELGGFL 


DDAAGFDAGF 


101 


FDISDTEAVA 


TEPLQRLMLH 


LAWETVERGH 


lAPHTLRSTL 


TGVYVGATGH 


151 


DYATRLETAP 


DELLPYLGGG 


TSGSLVSGRI 


AYALGLEGPA 


ISVDTACSSS 


201 


LVALHIiACQA 


LRRGECGLAL 


AGGGTVMSTP 


HTPHAFAHQK 


SLAQDGRCKP 


251 


FAAAADGMGL 


GEGVGLVLLE RLGDARKNGH 


PVLAVIRGSA 


VNQDGAGYGL 


301 


AAPNGPSQQH 


VIRAALADAG 


LTPDQIDAVE AHGTGTPIGt) 


AIEVQALLAT 


351 


YGADRSPDRP 


liWLGSVKSNT 


GHTQGAAGAA ALIKMVQAPR 


HGTLPPTLHV 


401 


DRPTPLAAWK 


KGAVRLLTEA 


VDWPRREEPR RVGISAFATS 


GTNAHLILEE 


451 


PPVDEAPVPD 


AARDQTSPVA 


PELPVAWSLS 


ARTPEALRAQ 


AKALVTHLAA 


501 


TDPAPSPAEV 


AYSLAATRSP 


LEHRAVLTGT 


DHTELLAAAR 


ALAAGEDHPD 


551 


LVRSTPGAGP 


KKIAWHFDGR 


PADGVTTGAA 


PGAKPGATPG 


ATFGAAFGGA 


601 


EFHSAFPLFA 


SAFDEARALL 


DTHLPTPLPT 


PHSELARFAV 


HTALARLLLE 


651 


TGVRPHTLTG 


DGVGHIAAAY 


AAGILTLDDA 


CRLAAAHAAA 


AQAAEGEQPA 


701 


PPDAYEPVLK 


QLTFQRATLT 


LTSTAPADTP 


lASADYWHHH 


LTSPAPTAPP 


751 


TPETHTLLHL 


GALSPEGTQT 


SAVSALLTAL 


ARLHTTGGTV 


DWTPLVRRTP 


801 


HPRTIDLPTY 


SFQATRYWLH 


DHTJ^HAAV* 






MonCn, probable epoxyhydrolase/cyclase Length: 300 amino acids . 


1 


VKNLRIPVSQ 


TVSLNVRYRP 


ADGPGAPGRP 


FLLLHGMLSN 


ARMWDEVAAR 


51 


LAAAGHPAYA 


VDHRGHGESD 


TPPDGYDNAT 


WTDLVAAVT 


ALDLSGALVA 


101 


GHSWGAHIiAL 


RliAAEHPDLV 


AGLALIDGGW 


YEFDGPVMRA 


FWERTADWR 


151 


RAQQGTTSAA 


DMRAYLRATH 


PDWSPTSIEA 


RLADYRVGPD 


GLLIPRLTST 


201 


QVMSIVAGLQ 


REAPADWYPK 


VTVPVRLLPL 


IPAIPQLSDQ 


VRAWVAAAEA 
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251 ALEQVSVRWY PGSDHDLHAG APDEIAADLL LLARSCEAMP GGKAGVRPA* 

M011E5 S-adeonosylmethionine-dependent methyltransferase Length: 277 
amino acids 

1 VNKTVAPEPS DIGHYYDHKV FDLMTQLGDG NLHYGYWFDG GEQQATFDEA 
51 iVIVQMTDEMIR RLDPAPGDRV LDIGCGNGTP AMQLARARDV EWGISVSAR 
101 QVERGMRRAR EAGLADRVRF EQVDAMNLPF DDGSFDHCWA LESMLHMPDK 
151 QQVLTEAHRV VKPGARMPIA DMVYLNPDPS RPRTATVSDT TIYAALTDIG 
201 DYPDIFRAAG WTVLELTDIT RETAKTYDGY VEWIRAHRDE YVDIIGVEGY 
251 ELFLHNQAAL GKMPELGYIF ATAQRP* 

MonT, putative monensin resistance gene (ABC-transporter) Lengtlt: 512 
amino acids 



1 


MSADLGARRW 


WAVGALVLAS 


MWGFDVTIL 


SLALPAMADD 


LGANNVELQW 


51 


FVTSYTLVFA 


AGMIPAGMLG 


DRFGRKKVLL 


TALVIFGIAS 


IiACAYATSSG 


101 


TFIGARAVLG 


LGAALIMPTT 


LSLLPVMFSD 


EERPKAIGAV AGAAMLAYPL 


151 . 


GPILGGYLLM 


HFWWGSVFLI 


NVPWILAFL 


AVSAWLPESK 


AKEAKPFDIG 


201 


GLVFSSVGLA 


ALTYGVIQGG 


EKGWTDVTTL 


VPCIGGLLAL 


VLFVMWEKRV 


251 


ADPLVDLSLF 


RSARFTSGTM 


LGTVINFTMF 


GVLFTMPQYY 


QAVLGTDAMG 


301 


SGFRLLPMVG 


GLLVGVTVAN 


KVAKALGPKT 


AVGIGFALLA 


AALFYGATTD 


351 


VSSGTGLAAA 


WTAAYGLGIiG 


lALPTAMDAA 


LGALSEDSAG 


VGSGVNQSIR 


401 


TLGGSFGAAI 


LGSIIjNSGYR 


GKLDLDGVPE 


QAHGAVKDSy 


FGGLAVARAI 


451 


KSNGLADSVR 


SAYVHALDW 


LWSGGLGLL 


GWLAWWLP 


RHVGQSTAKT 


501 


AESEHEAADA 


V* 








MonRII, probable 


repressor protein Length: 


192 amino acids 


1 


VPGLRERKKA 


RTKAAI<2REA VRLFREQGYT 


ATTIEQIAEA AEVAPSTVFR 


51 


YPATKQDLVF 


SHDYDLPFAM 


MVQAQSPDLT 


PIQAERQAIR 


SMLQDISEQE 
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101 LALQRERFVL ILSEPELWGA SLGMIGQTMQ IMSEQVAKRA GRDPRDPAVR 
151 AYTGAVFGVM LQVSMDWAND PDMDFATTLD EALHYLEDLR P* 

MonAIX, thioesterase Length: 269 amino acids 

1 MDRGTAARAP QIGDEFGAAT GNGVWLRRYH AAAEAPVRLV CFPFAGGSAS 
51 YYFGLSGLLA PGVEVIiAVQY PGRQDRHAEP CLASVAELAD iSWPHLPCDG 
101 KPPALFGHSL GAIVAFEVAR RLRGPAGPGL PVHLFVSGGL ARPYRPAGRS 
151 GAPGDADILA HLRAMGGTDE,RFFRSPELQE LVLPALRADY RAVATYEAPG 
201 PGRLDCPITA LIGDADERTS PEQAATWRER TGAAFDLRVL PGGHFYLDGC 
251 QEQVAAWTE ALTAGPGV* 

MonAI, polyketide synthase multi-enzyme MONSl, housing loading module 
and extension module 1 Length: 3026 amino acids 

1 MAASASASPS GPSAGPDPIA WGMACRLPG APDPDAFWRL LSEGRSAVST 

51 APPERRRADS GLHGPGGYLD RIDGFDADFF HISPREAVAM DPQQRLLLEL 

lOi SWEi^LEDAGI RPPTLARSRT GVFVGAFWDD YTDVLNLRAP GAVTRHTMTG ' 

151 VHRSIliANRI SYAYHLAGPS LTVDTAQSSS LVAVHIACES IRSGDSDIAF 

201 AGGVNLICSP RTTELAAARF GGLSAAGRCH TFDARADGFV RGEGGGLWL 

251 KPLAAARRDG DTVYCVIRGS AVNSDGTTDG ITLPSGQAQQ DWRLACRRA 

301 RITPDQVQYV ELHGTGTPVG DPIEAAALGA ALGQDAARAV PLAVGSAKTN 

351 VGHLEAAAGI VGLLKTALSI HHRRLT^SLN FTTPNPAIPL ADLGLTVQQD 

401 LADWPRPEQP LIAGVSSFGM GGTNGHWVA AAPDSVAVPE PVGVPERVEV . 

451 PEPVWSEPV WPTPWPVSA HSASALRAQA GRLRTHLAAH RPTPDAARVG 

501 HALATTRAPL AHRAVLLGGD TAELLGSLDA Ll^GAETASI VRGEAYTEGR 

551 TAFLFSGQGA -QRIiGMGRELY AVFPVFADAL DEAFAALDVH LDRPLREIVL • 

601 GETDSGGNVS GENVIGEGAD HQALLDQTAY TQPALFAIET SLYRLAASFG 
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651 LKPDYVLGHS VGEIAAAHVA GVLSLPDASA LVATRGRLMQ AVI?APGAMAA 

701 WQATADEAAE QLAGHERHVT VAAVNGPDSV WSGDRATVD ELTAAWRGRG 

751 RKAHHLKVSH AFHSPHMDPI LDELRAVAAG LTFHEPVIPV VSNVTGELVT 

801 ATATGSGAGQ ADPEYWARHA REPVRFLSGV RGLCERGVTT FVELGPDAPL 

851 SAMARDCFPA PADRSRPRPA AIATCRRGRD EVATFLRSLA QAYVRGADVD 

901 FTRAYGATAT RRFPLPTYPF QRERHWPAAA GVGQQPETPE LPESSESSEQ 

951 AGHEREEGAR AWGGPEGRLA GLSVNDQERV LLGLVTKHVA WLGDASGTV 



1001 


QAAHTFKQLG. 


FDSMAAAELS 


ERLGTETGLP 


LPATLTFDYP 


TPIiAVAAHLR 


1051 


AELTGTPAPA 


GSAPATGALG 


AGDLGTDEDP 


VAIVAMSCRY 


PGGAGTPEDL 


1101 


WRLVADGADA 


IGDFPTDRGW 


DLARLFHPDP 


DRSGTSCTRQ 


GGFLYDAADF 


1151 


DAEFFDISPR 


EALAVDPQQR 


LLLECAWEAF 


ERAGLDPRAL 


KGSPTGVFVG 


1201 


MTGQDYGPRL 


HEPSQATDGY 


LLTGSTPSVA 


SGRLSFSFGL 


EGPALTVDTA 


1251 


CSSSLVTLHL 


AAQALRRGEC 


DLALAGGATV 


LATPGMFTEF 


SRQRGLAPDG 


1301 


RCKPFAAGAD 


GTGWAEGVGL 


VLLERLSEAR 


RKGHAVLAVI 


RGSAINQDGA 


1351 


SNGLTAPNGP 


SQQRVIRAAL 


AAARLTADEV 


DWEAHGTGT 


TLGDPIEAQA 


1401- 


LLATYGQGRS 


AERPLWLGSV 


KSNIGHTQAA 


AGVAGVIKMV 


MA[y[RHDLLPA 


1451 


TLHVDEPSGH 


VDWSTGAVRL 


LTEPWWPRG 


ERPRRAAVSS 


FGISGTNAHL 


1501 


VLEEAGQDEY 


VAGAADDAGP 


VDGAVLPWW 


SGRTGAALRE 


QARRLRELVT 


1551 


GGS2^VSVSG 


VGRSLVTTRA 


VFEHRAVWG 


RDRDTLIGGL 


EALAAGDASP 


1-601 


DWCGVAGDV 


GPGPVLVFPG 


QGSQWVGMGA 


QLLGESAVPA 


ARIDACEQAL 


1-651 


SPYVDWSLTE 


VLRGDGRELS 


RVDWQPVLW 


AVMVSLAAVW 


ADHGVTPAAV 


1701 


VGHSQGEIAA 


WVAGALTLE 


DGAKIVALRS 


RALRQLSGGG 


AMASLGVGQE 


1751 


QAAELVEGHP 


GVGIAAVNGP 


SSTVISGPPE 


QVAAWADAE 


ARELRGRVID 


1801 


VDYASHSPQV 


DAITDELTHT 


LSGVRPTTAP 


VAFYSAVTGT 


RIDTA<5LDTD 
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1851 


yWVTNLRRPV 


RFADAVTALL 


ADGHRVFIEA 


SSHPVLTLGL 


QETFEEAGVD 


1901 


AVTVPTLRRE 


DGGRARLARS 


LAQAFGAGCA 


VRWENWFPAT 


GTSTVELPTY 


1951 


AFQRRRYWLE 


APTGTQDAAG 


LGLiAAAGHPL 


LGAATEIADG 


DIRLLTGRIS 


2001 


RHSHPWLAQH 


TLFGAAWPA 


SVLAEWALRA 


ADEAGCPRVD 


DLTLRTPLVL 


2051 


PETAGVQVQI 


WGPADARDG 


HRDFHVYARP 


DGKDASEGEG 


lAEGEGASEG 


2101 


EGASGGTDAP 


WTCHADGRLV 


AEPTGTASED 


SPDTVWPPPG^^'AEPVDLGDFY 


2151 


ERAAATGVGY 


GPVFTGLRAL 


WRRDGELFAE 


AVLPQEAPET AGPGiVEHPALL 


2201 


DAALHPALLG 


ERPAEEDK\m 


LPFTLTGVTL 


WATGATSVRV 


RLTPLDDDPD 


2251 


ASADGRAWRV 


GVSDPTGAEV 


LTCEALVAVA 


AGRRELRAAG 


ERVSDLYAVE 


2301 


WVPVPGPGPV 


GEGADFSGWA 


GLGECGERWE 


CVGRVERWYE 


DLDALGAAVE 


2351 


GGASVPSWL 


ATAAAAPGGA 


GDGAADALSA 


VRWTGALLDQ 


WLADARFADA 


2401 


RLWITSGAV 


ATGDDFLPDP 


AAAAVRGLVE 


QAQVRHPGRI 


LLVDTEAGAG 


2451 


LGVGAGVDDA LLEQAVAMAL 


GADEPQIiAIiR 


AGRVLAPRLT APQDAAVTEA 


2501 


ARPLDPDGTV 


LITGPAGAPV ADIiAEHLVRT 


GQCRHLLLLP 


GDGELEEMAE 


2551 


ELRGLGATVD 


LSTADPADPT 


ALAEWAAVE 


GDHPLTGVIH 


ATGWDAFDP 


2601 


GDSASDLMID 


SASDSFAEAW 


SSRAGVTAAL 


HTATAHLPLD 


LFAVLSPAGA 


2651 


DIX^IARSAAA 


AGADAFSAAL 


ALRRHTTVTT 


DTTAPPRTTA 


PPRTTASPRT 


2701 


TALSSSRTTG 


VALAYGPPTA 


PRPGIKGTAP 


GRIPVLLDAA 


RAHGGGSPLL 


2751 


GARLAARALA 


AESAAEGVAG 


LPAPLRALAV 


AAAAAGAPTR 


RTAADRKPPA 


2801 


DWPARLAPLS - 


APEQLRLLID AVRTHAAAVL 


GRTDPEALRG 


DATFKQLGLD. 


2851 


SLTAVELRNR 


LVEDTGLRLP 


TALVFRYPTP 


AAIAAHLRER 


LTSPSETTAT 


2901 


QRSGGQTPAA GQASSALAPG GSAAGPPAAD 


TVLSDLTRME NTLSVLAAQL 


2951 


PHTETGEITT 


RLEALLTRWK 


TTNATANDSG 


DGNGGDDDAA 


ERLKAASADQ 


3001 


IFDFIDNELG 


VGHGTSRVTP 


TPKAG* 







-76- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 



PCT/GBOO/02072 



MonAII, polyketide synthase multi-enzyme MONS2, housing extension 



module 2 Length: 2239 amino acids 






1 


MASEEQLVEY 


LRRVTTELHD 


TRRRLVOEED 


RRQEPVALVG 


MACRFPGGVA 


51 


SPEDLWDLVA 


AGKDAIEDFP 


TDRGWDLEAL 


YDPDPAAYGT 


SYVRHGGFVD 


101 


DAGSFDADFF 


GISPREAIiAM 


DPOORLMLET 


SWELFERAGI 


EPVSLKGSRT 


151 


GVYAGVSSED 


YMSQLPRIPE 




LTSVISGRVA 


YNYGLEGPAV 


201 


TVDTACSASL VAIHLASQAL 


RORECDIxALA 


GGVLVIiSSPL MFTEFCRQRG 


251 






EGIGLLLLER 


LSDARRNGHK 


VLAVIRGSAV 


301 






IRAALDNARL 


TPSEVDAVEA 


HGTGTKLGDP 


351 


lEAGALLATY 


GQHRARPLLL 


GSLKSNIGHT 


HATAGVAGVI 


KTVMAIRNGL 


401 


. LPATLHVEEL 


SPHVDWDAGA 


VEWTEPTPW 


PETGHPRRAG 


VSAFGISGTN 


451 


AHLILEEAPP 


EEDVPAPVW 


ESGGWPWW 


SGRTPEALRE 


QARRLGEFVA 


501 


GDTDALPNEV 


GWSLATTRSV 


FEHRAVWGR 


DRDALTAGIiG 


ALAAGEASAG 


551 * 


WAGVAGDVG 


PGPVLVFPGQ 


GAOWVGMGAO 


LLDESAVFAA 


RIAECERALS 


601 


AHVDWSLSAV 


LRGDGSELSR 


VEWOPVIjWA 


VMVSLiAAVWA 


DYGVTPAAVI- 


651 


. GHSQGEMAAA 


CVAGALSLED 


AARTVAVRJ5D 


ALRQLQGHGD 


MASLSTGAEQ 


701 


AAELIGDRPG 


VWAAVNGPS 


STVISGPPEH 


VAAWADAEA 


RGLRARVIDV 


751 


GYASHGPQIp 


QLHDLLTERL 


ADIRPTNTDV 


AFYSTVTAER LTDTTALDTD 


801 


YWVTNLRQPV 


RFADTIEALL 


ADGYRLFIEA 


SAHPVLGLGM 


EETIEQADMP 


851 


ATWPTLRRD. HGDTTQLTRA 


AAHAFTAGAD 


VDWRRWFPAD^ 


PAPRTIDLPT 


•901 


YAFQRRRYWL 


ADTVKRDSGW 


DPAGSGHAQL 


PTAVALADGG 


WLNGRVSAE 


951 


RGGWLGGHW 


AGTVLVPGAA 


LVEWVLRAGD 


EAGCPSLEEL TLQAPLVLPE 



1001 SGGLQVQWV GAADEQGGRR DVHVYSRSEQ DASAVWQCHA VGELGRASVA 
1051 RPVRQAGQWP PAGAEPVEVG GFYEGVAAAG YEYGPAFRGL RAMWRHGDDL 
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1101 LAEVELPEEA GSPAGFGIHP ALIiDAALHPL LAQRSRDGAG AGAHGGQVLL 

1151 PFSWSGVSLW ASEATTVRVR LTGLGGGDDE TVSLTVTDPA GGPWDVAEL 

12 01 RLRSTSARQV RGSAGPGADG LYELRWTPLP EPLPVPAPAN GRDVAADLSG 

1251 CAVLGELVAE PGPGIDLEGC PCYPGVGALA DNASPPSMIL APVHSDTTGG 

1301 DGIiALTERVL RVIQDFIiAAP SLEQKQTRLA FVTRGAADTG STTGGSAAPA 

1351 EAVDPAVAAV WGLVRSAQSE NPGRFVLLDT DAPIiDQASVA'^^LVDAVRSAV 

14 01 EADEPQVALR GGRLLVPRWA RAGEPVELAG PAGARAWRLV GGDSGTLEAV 

1451 VAEACDDIVL RPLAPGQVRV AVHTAGVNFR DVLIALGMYP DPDALPGTEA 

1501 AGWTEVGPG VTRLSVGDRV MGMMDGAFGP wavadarmla pvppgwgtrq 

1551 AAAAPAAFLT AWYGLVELAG LKAGERVLIH AATGGVGMAA VQIARHVGAE 

1601 VFATASPGKH AVLEEMGIDA AHRASSRDLA FEDAFRQATD GRGVDWLNS 

1651 LTGELLDASL RLLGDGGRFV EMGKSDPRDP ELVALEHPGV SYEAFDLVAD 

1701 AGPERLGLML DRLGELFAGG SLVPLPVTAW PLGRAREALR HMSQARHTGK 

1751 LVLDVPAPLD PDGTVLVTGG TGTIGAAVAE HLARTGESKH LLIVSRSGPA 

1801 AHGAEELVSR lAEFGAEATF VAADVSEPDA VAALIEGIDP AHPLTGWHA 

1851 AGVLDNALIG • SQTTESLTRV WTU^KAAAAQQ LHEATRESRL GLFVMFSSFA 

1901 STMGTPGQAN YSAANAYCDA LAALRRAEGL AGLSVAWGLW EATSGLTGTL 

1951 SAADRARIDR YGIRPTSAAR GCALLAAARA HGRPDLLAMD LDARVPAASD 

2001 APVPAVLRTL AAAGAPATAR PTAAAAADGA TDWSGRLAGL TEEARLELLT 

2051 ELVCTHAAGV LGHM)AGAVQ VDAPFKELGF DSLTAVELRN RIAAATGLKL' 

2101 PAALVFDYPQ ARVLAAHLAE RLVPEGAGAM GGVSGAEGVR DAYGAGGPGG 

2151 DMTAQVLLEV ARVEHTLSAA VPHGLDRAAV AARLEALLAR CTATTAATGA 

2201 AGAAVEGDGD SDGDGAVDQL ETATAEQVLD FIDNELGV* 

MonAin, polyketide synthase multi-enzyme MONS3, housing extension 
modules 3 and 4 Length: 4133 amino acids 



-78- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 PCT/GBOO/02072 



1. 


MVSEEKLVDY 


LKRVSADLHA 


TRQRLREAEE 


RGQEPVAWE 


AACRYPGGIR 


51 


TPEDLWDLVA 


AGGNALGAFP 


DNRGWDLRRL 


FHPDPDHPGT 


TYAREGGFLH 


101 


DADLFDPEFF 


GISPREAAVL 


DPQQRLLLEC 


AWEALERAGI 


DPRSLQGSRT 


151 


GVYAGAALPG 


FGTPHIDPAA 


EGHLVTGSAP 


SVLSGRLAYT 


FGLEGPAVTI 


201 


DTACSSSLVA 


VHLAAHALRQ 


RECDLAIiAGG 


VTVMTTPYVF 


TEFSRQRGLA 


251 


ADGRCKPFAA 


AADGTAFSEG 


AGLLVLBRLS 


DARRAGHRVL 


AVIRGSAVNQ 


301 


DGASNGLTAP 


NGPAQQRVIR 


AALAGARLSP 


AEVDAVEAHG 


TGTI^GDPIE 


... 351 


ADALLATYGQ 


ERHGGRPLWL 


GSVKSMIGHT 


QGAAGAAGLI 


KMVQALRHET 


401 


LPATLYADEP 


TPHADWESGA 


•VRLLSAPVAW 


PRGEHGEHTR 


RAGISSFGIS 


451 


GTNAHLILEE 


APAADAEGAG 


GDGDGDGGGV 


RPWRVGATG 


PREEQGQGQG 


501 


QEQHQQQRQQ 


RQRSSMMPTP 


HLPWLLSARS 


PAALRAQADA 


LANHVAHADH 


551 


SIADIGGTLL 


RRTLFEHRAV 


VLGTDRDERA 


AALAALAAGR 


AHPALTRAAG 


601 


PARNGGTAFL 


FTGQGSQRPG MGRQLYDTFD 


VFAESLDETC 


ARLDPLLEQP 


651 


LKPVLFAPAD 


TAQAAVLHGT 


GMTQAALFAL 


EVALYRQVTS 


FGIAPSHLTG 


701 


HSVGEIAAAH 


VAGVFSLADA 


GTLVAARGRL 


MQALPAGGAM 


LAVQAAEDDV 


751 


LPLIAGQEER 


LSLAAVNGPT 


AVWSGEAAA 


VGEVEKALRG 


RGLKTKRLNV 


801 


SHAFHSPLIE 


PMLDDFREVA 


RGLTFHAPTL 


PWSNLTGRL 


ADAEIjMADAE 


851 


YWVRHVRRPV 


RFHDGLRALS 


EQGWRYLEL 


GPDPVLATMV 


QDGLPAPAEG 


901 


EEPEPWAAA 


LRSKHDEGRT 


LLGAVAALHT 


DGQPADLTAL 


FPADAGQVPL 


951 


PTYRFQRRRY 


WRVAPDAAAP ARAAGLQETG 


HPLLPAVIRQ 


ADGGILLAGR 



1001 LSLRTHPWLA DHTIAGGVPL PATAPVELAL LAGRHAACDT IDDLTLBTPL 
1051 LLDDTGTGVG AAVGAGADAL VDAIEVQLAL GAPDGSGRRA LTVHSRPADD 
1101 AADDGDAADA ADAAGRGGPG GSGDLGDPGD PGDLGDGGGS RGWRRHATGI 
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1151 


LSAGPAAEPA 


APDAAPWPPA 


DATALDVDAL 


YARLDAQGYS 


YGPAFRAVHA 


1201 


AWRHGDDLYA DVRLADEQRA 


EADAFALHPA 


LLDAALHAVD 


ELYRGSEGRG 


1251 


QEQGQGGQEP 


EQGRGDADAP 


VRLPPSFSDI 


RHHATGATRL 


WVRLSPQGDD 


1301 


RLRLSLTDGE 


GGQVATVDAL 


QLRLIPADRW 


RAARPTTAAP 


LYHLDWHELP 


1351 


LPEPAETDPA 


AHSWAVLGAH 


DAGIiAPAAHY 


PDLAALKAAV 


EAGEPVPDIV 


1401 


FAPFPAQGTE TDVPAQVRAH ARHALELLRD WLTTEAFAAA "kLVVLTTGAV 


1451 


TARPEDGPAD 


LATAPVWGLV RAAQAEQPDH 


WLVDIDKDI 


DKDTDEETDQ 


1501 


ATDAGTASRH ALPAALAA^ AQAETQLALR AGTVLVPRLA WPPRTDTPA 


1551. 




DTVT) TO T AG 




X vjrvj X wVJXJVjSi/xTk. 


VA P WT I A A A 


1601 


ARHLLLVSRR 


GDAAEGVAEL 


RADLADDGVD 


V x\. V *xr^v_»j-^ X X X/ 


PDAT.AnT.T.An 


1651 


IPAAHPLTAV 


VHTAGVIDDS 


LITAMTPERL 


n A*\7T . a p "K" A n A 


ATaTWT.PT'T .T*P'n 
/llrVXliorLCixj X rU-J 


1701 


KDLSAFVLPS 


SGASVLGNGG 


QANYAAMTTF 


T .XFTT . APTTPP A 


A HT . A A T Q^TA "M 


1751 


GLWESASGGM 


■AARLGDADRA 


RIHRTGVTGL 


TDEQALALFD 


AALTAEHPTV 


1801 


LATRFDRAVL 


RGQAAARTLQ 


PALRGLVRTP 


RPTASAGAIG 


STAATGSATD 


1851 


BNAPS.SWAAR 


LARIiSAADRD 


RAIiNELIREQ 


lATVLAHPSP 


DTIELGRAFQ 


1901 


ELGPDSLTAL 


ELRNRLSTAT 


GIRLPATLVF 


DHPSPTALVR 


HLHSHLPDEA. 


1951 


QHTSPTAPGA 


SAEGTAATAT 


GIDDDPIAIV 


GMACRYPGGV 


TSPEQLWQLV 


2001 


ATGTDAIGPF 


PEDRGWDTAG 


LFDPDPDQVG 


HSYTREGGFL 


YDAARFDAGF 


2051 


FGISPREAAA 


TDPQQRLLLE 


TAWQAFEHAG 


IDPAALRGTP 


CGVITGIMYD 


2101 


DYGSRFIiARK 


PDGPEGRIMT 


GSTPSVASGR 


VAYTFGLEGP 


AITVDTACSS 


2151 


SLVAMHIiAAQ ALRQGECELA LAGGVTVMAT 


PNTFVEFSRQ 


RGIiAPDGRCK 


2201 


PFAAAADGTG 


WGEGAGLWL 


ERLSDARRKG 


HRVIALLRGS 


AVNQDGASNG 


2251 


MTAPNGPSQE 


RVIRTALAGA 


GRGPEDIDW 


EAHGTGTTLG 


DPIEAQAIiLA 


2301 


TYGQGRPEDR 


PLWLGSVKSN 


IGHTQAAAGV AGVIKMVMAL 


RHEQLPTTLH 
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2351 


ADEPTPHVQW 


DGGGVRLLTE 


PVPWSRGERT 


RRAGVSSFGI 


SGTNAHLILE 


2401 


EPPEEDLPEP 


VAAEPGGWP 


WWSGRTPDA 


LREQARRLGE 


FWGAGDVSA 


2451. 


AEVGWSIiATT 


RSVFEHRAW 


AGRDRDDLVA 


GMQALAAGET 


PTDWSGAAA 


2501 


SSGAGPVLVP 


PGQGSQWVGM 


GAQLLDESPV 


PAARIAECEQ 


ALSAYVDWSL 


2551 


SDVLRGDGSE 


LSRVEWQPV 


LWAVMVSLAA 


VWADYGVTPA 


AWGHSQGEM 


2601 


AAACVAGALS 


LEDAARIVAV 


RSDALRQLQG 


HGDMASLGTG 


AEQAAELIGD 


2651 


RPGVWAAVN 


GPSSTVISGP 


PEHVAAWAE 


AEARGLRARV 


IDVGYASHGP 


2701 


QIDQLHDLLT 


EGLADIRPAN 


TDVAFYSTVT 


AERLTDTTAL 


DTDYWVTOTiR 


2751 


QPVRFADTIE 


ALIiADGYRLF 


lEASAHPVLG 


LGMEETIEQA 


DIPATWPTL 


2801 


RRDHGDTTQL 


TRAAAHAFTA 


GADVDWRRWF 


PADPTPRTVD 


LPTYAFQHQH 


2851 


yWLEEPSGLT 


GDAADIiGMVA 


AGHPLLGACV 


ELAESbSYLF 


TGRLSRRAPS 


2901 


WLAEHWAGT 


VLVPGAALVE 


WVLRAGDEAG 


CPTIEELTLQ 


APLVLPESGG 


2951 


LQVQWVGAT 


DEQSGRRDVH 


VYSRSEQDAS 


AVWVCHAVGV 


VSSEMPEAAA 


3001 


EL-SGQWPPAG 


AEAVDVEDFY 


ARAAEAGYAY 


GPAFQGLRAL 


WRHGTELFAE 


3051 


•WLPEQAGGH 


DGFGIHPALL 


DAALHPLMLL 


DRPADGQMWL 


PFAWSGVSm 


3101 


ADRATHVRVR 


LSPRGEAAER 


DIjRWIADAT 


GAPVLTVDAL 


TLRAADPGRL 


3151 


GAAARGGVDG 


LYTVDWTPLP 


LPQPLPLPRT 


DAGGSADWI 


LSDNSSAALA 


3201 


DAVSSATAAG 


GGAPWALLAP 


VGGGSADDGL 


PWRRTLSLV 


QEFLAAPELT 


3251 


ESRLVIVTRG 


AVATDADGDV 


AASAAAVWGL 


IRSAQSENPG 


RPVLLDVEEE 


3301 


HLHPDGGELP 


YAAL-RHAVEE 


LDEPQLAIiRS 


GKFLVPRMTP 


AAAPEELVPP 


3351 


VGTSGWRLGT 


SGTATLEISTLS 


VIDAPEAFAP 


LEPGQVRISV 


RAAGMNFRDV 


3401 


LIALGMYPDK 


GTFAGSEGAG 


HVTEVGPGVT 


HIiSVGDRVMG 


LFEGAFAPIiA 


3451 . 


VADARMWPI 


PEGWSFQEAA AVPWFLTAW YGLVDLGRLR 


AGESLLIHAG 


3501 


TGGVGMAATQ 


lARHLGAEVF 


ATASPAKHGV 


LDGMGIDAAH . 


RASSRDLDFE: 
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7 ^ c; 1 

-3 O O i 


T^TPT P A A TrZf^T? 
Ci X Xj IV/lrt. X VjvjK. 


v:ri*lLJ V V XlLM oXLri. 


m^PTTiAQT PT 
VjJZiX" XXfiloXiKXj 


T A T^^/^PA/rrrr^TVff 
Xi/iiliLjijKiyi VX^lYl 


CjixX JJivKJDPDR 


J D U X . 




Ss-r\r X/Xj V JriXfiVa 


Jr X^K X vjriiiyiXLftJii 


T r^TPT PA 0/^7\ T 

Xii^jJiXir AotjfiXj 


AFXjF V y TWPL 


J O O i 


rip A T? 17 A "CM 


S \^/iI\Jl, X VjrrVXi V 


T .PT OP AT .PvPTl 
XiJi X ir Iri-U-iX/ jrXJ 


GTVLITGGTG 


VliAAAVAEHL 


J / VJ X 


V JXli VV V rCtlXjXi 


T.Ar^PPr^QP A P 
XiriVjlCtCijO JCiioJr 




ELGAEVTFAA 


ADVSDPDAVA 


J / O J. 


ill Xi V urix 1 X^ ±r/ari 


PLTGVIHAAG 


VLDDAWTAQ 


TPESLARVWA 


AKATAAHLIiH 


O O U X 


ii/i X x\Ji/-i_rCXjVjTXj 


FLVFSSAAAT 


LGSPGQANYA 


AANAYCDALV TfeQRRAEGLAG 


^ O ^ X 


Xl o X *cr Vh vjrXi ViH>^ X 


ASGMTGHLGE 


TDLARMKRTG 


FTPLTTEGGL 


ALLDAARAHG 


^ 17 U J. 


XvXrXlV VAVXyXU-J 


ARAVAAQPA& SRPALLRAIA 


AGATPGARTA 


RRTAAAGSVA 


3951 


PAGGLADRLA 


GLPHPERRRL 


IiLDLVRGNVA 


GVLGHSDHDA 


VRPDTSPKEL. 


4001 


GPDSLTAVEL 


RNRLAAATGL 


KLPAALVFDY 


PESATLVDHL 


LERLSPDGAP 


4051 


PPVICDAADPV 


LNDLGRIESS 


LDALALDADA 


RSRVTRRLNT 


LLSKLNGAAT 


4101 


AGSPADVTDL 


DALDALDDVS 


DDEMFEFIDR 


EL* 





MonAIV, polyketide synthase multi-enzyme MONS4, housing extension 
modules 5 and 6 Length: 4039 amino acids 



1 MSSAEESSPD VSGTGVSGTG ESATGTSSTE AKLRQYLKRV TVDLGQARRR 



51 


LREVEERAQE 


PIAIVSMACR FPGDTRTPEA 


LWDLVAEGGD 


AIDDFPTNRG 


101 


WDLESLYHPD 


PDHPGTSYVR RGGFLYDAPA 


. FDASFFGISP 


REALAMDPQQ 


•151 


RVLMETAWQL 


LERAGIDPAS LKLSATGVYI 


GAGVLGFGGA 


QPDKTVEGHL 


201 


LTGSALSVLS 


GRISFTLGLE GPSVSVDTAC 


SSSLVSMHIiA 


AQALRQGECD 


251 


LALAGGVTVM 


STPGAFTEFS • RQGALSPDGR 


SKAFAASADG 


TGFSEGAGLIi 


301 


LLERLSDAIIR 


NGHKVLAVIR GSAVNQDGAS 


NGLTAPNGPS 


QERVIRAALA 


351 


NAGLGAAEVD 


AVEAHGTGTK LGDPIEAGAL 


LATYGRDRDE 


DRPLWLGSVK 


401 


SNIGHPQGAA 


GVAGVIKMVM ALQRELLPAT 


LYVDEPTPHV 


DWSSGSVRLL 


451 


TEPVPWTRGE 


RPRRAGVSAF. GMSGTNAHVI 


LEEAPPEEAA 


AAETPAEGTG 


501 


AWPWWSGR 


GEEALRAQAA QLAEHVRDDD ^RPASPLEVG 


WSLATTRSVF 
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• 551 ENRAVWGDD RDALLDGLRS LAAGEASPDV VSGAVGPTGP GPVMVFPGQG 

601 GQWVGMGARL LDESPVFAAR lAECEQALSA YVDWSLTDVL RGDGSELARI 

651 DWQPVLWAV MVALAAVWAD QGIEPAAWG HSQGEIAAAC WGAISLDEA 

701 ARIVAVRSVL LRQLSGRGGM ASLGMGQEQA ADLIDGHPGV WAAVNGPSS 

751 TVISGPPEGI AAWADAQER GLRARAVASD VAGHGPQLDA ILDQLTEGIA 

8 01 GIRPAATDVA FYSTVTAGHL TDTTELDTAY WVRNVRRTVR FADTIDALLA 

851 DGYRLFIEVS PHPVLNLALE GLIERAAVPA TWPTLRRDH GDTTQLARAA 

901 AHAFAAGADV DWRRWFPADP APRTVDI/PTY AFQRQDFWPA PAGGRSGDPA 

951 GLGIiAASGHP LLGASVGLAS GDVHLLSGRV SRQSAAWLDD HWAGQALVP 

1001 GAAQVEWVLR AGDDAGCSAL EELTLQTPLV LPDTGGLRIQ WVEAADAHG 

1051 RRDVRLFSRP DDDDAFASTH PWTCHATGVL APAPTDGTNG TRDAADTLDG 

1101 AWPPADAEPV PADDLYAQAD RTGYGYGPAF RGVRALWRHG KDVLAEVTLP 

1151 KEAGDPDGFG IHPAIiLDAVL QPAALLLPPT DAEQWLPFA WNDVALHAVR 

1201 ATTVRVRIiTP LGERIDQGLR ITVADAVGAP VLTVRDLRSR PTDTGRLAAA 

1251 ATRDRHGLFD LEWIAPENAA BNAAGPARDA SEGWVTLGED AASLADLLAS - 

1301 VEAGAPAPQL VAAPVEPDRT DDGLALATHV LDLVQTWLAS PLHDSRLVLV 

1351 TRGAVTDADV DVAAAAVWGL VRSAQSEHPG RFTLIDLGPD DTLAAAMQAA 

1401 HIiEEPQIiAVH GGEIRVPRLV RATTDPTAPN GTPEADRTAD PSEGLHRNGT 

1451 VlilTGGTGVL GRLVAEHLVT EWGVRHLLLA SRRGDQAPGS AELRARLSEL 

1501 GASVEIAPAD VGDAEAVAAL lASVDPAHPL TGVIHAAGVL DDAVITAQTP 

1551 ESLARVWATK ATAARHLHEA TRETPLDFFV VFSSAAASLG SPGQAlSiYAAA 

1601 NAYCDALVQH RRAQGLAGLS lAWGLWQATS GMTGQLSETD LARMKRTGFA 

1651 AliTDEGGLAL' LDAARAHDRA YWAADLDPR AVTDGLSPLL RALTAPATRR'- 

1701 RVASEGLADG ALATRLAGLD ADGRLRLLTD WREYVAAVL GHGSAARVGV 
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1751 


DIAFKDLGFD 


SLTAVELRNR 


LSAACDVRLP 


ATLIFDHPTP 


QALiATHLVDR 


1801 


IiAGSTSATTT 


VNATAPAAAH 


VAAGADVDAD 


TDDPVAIVAM 


TCRFPGGVAS 


1851 


PDDLT^DLLDA 


RKDAMGAFPT 


DRGWDLERLF 


HPDPDHPGTS 


YTDQGGPLPD 


1901 


AGDFDAAFFG 


INPREALAMD 


PQQRLLLEAS 


WEVLERAGID 


PTTLKGTPTG 


1951 


TYVGLMYHDY AKSFPTADAQ 


LEGYSYLAS.T 


GSMVSGRVAY 


TLGLEGPAVT 


2001 


VDTACSSSLV SIHIaATQALR HGECDLALAG GVTVMADPDm"*FAGFSRQRGL 


2051 


SPDGRCKAYA 


AAADGVGFSE 


GVGVLLLERL 


SDARRHGRRV 


LGWRGSAVN 


2101 


■QDGASNGLTA PNGPSQERVI 


RQAIxASGGLS 


SVDVDWEGH 


GTGTTLGDPI 


2151 


EAQALLATYG 


QGRPEDRPLW - LGSVKSNIGH 


TQAAAGVAGV 


IKMVMAMRHG 


2201 


WPASLHVDV 


PSPHVEWDSG 


AVRIAVESVP 


WPQVEGRPRR AGVSSFGASG 


2251 


TNAHVIVESV 


PDGIiEEDSVS 


VGGEALETET 


DGRLVPWWS 


ARSPQALRDQ 


2301 


ALRLRDFASD 


ASFRAPLADV 


GWSLLKTRAL 


HEHRAWVGA 


ERAELIAALE 


2351 


AliATGEPHAA 


LVGPACSQAR 


VGGDDWWLF 


SGQGSQLVGM 


GAGLYERFPV , 


2401 


FAAAFDEVCG 


LLEGPLGVEA 


GGLREWFRG 


PRERLDHTVW AQAGLFALQV 


2451 


GIiARIiWESVQ 


VRPDWLGHS 


IGEIAAAHVA 


GVFDLADACR -WGARARLMG 


2501 


GLPEGGAMCA 


VQATPAELAA 


DVDGSAVSVA 


AVNTPPS-TVI 


SGPSDEVDRI 


2551 


AGVWRERGRK 


TKALSVSHAF 


HSALMEPMLA 


EFTEAIRGVK 


FRQPSIPLMS 


2601 


NVSGERAGEE 


ITDPEYWARH VRNAVLFQPA 


lAQVADSAGV 


FVELGPAPVL 


2651 


TTAAQHTLDE 


SDSQESVhVA 


SLAGERPEES 


AFVEAMARLH 


TAGVAVDWSV 


2701 


LFA-GDRVPGL 


VELPTYAFQR 


ERFWLSGRSG 


GGDAATLGLV 


AAGHPLLGAA 


2751 


VEFADRGGCL 


LTGRLSRSGV 


SWLADHWAG 


AVLVPGAALV 


EWALRAGDEV 


2801 


GCVTYEELML 


QAPLVyPEAS 


GLRVQWVEE AGEDGRRGVQ 


I.YSRPD7U:iAV 


2851 


GGDDSWICHA 


TGVLSPESAR 


liDTELGGVWP 


PAGAEPLDVD 


GFYAQAGEAG 


2901 


YGYGPAFRGL 


RAVWRHGQDL 


LAEWLPEAA 


GAHDGY<3IHP 


ALLDATLHPL 



-84- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 PCT/GBOO/02072 



2951 


LAARFMDGSE 


DDQLYVPFGW 


AGVSLRAVGA 


TTVRVRLRPV 


GESVDQGLSV 


3001 


TVTDATGGPV 


LSVDSLQTRP 


VKPSQLAAAQ 


QPDVRGLFTV 


EWTPLPQTDA 


3051 


DGEADWWLS 


DGVGRLADW 


SAAGGEAPWA 


WAPVDASVG 


DGREGLDGRL 


3101 


WERVLSLVQ 


EFLALPELAE 


SRLLWTRGA 


VATGVDGDGD 


VDASAAAVWG 


3151 


LVRSAQSENP 


. GRFILLDVDG 


DGDDQGPDIiN 


GRHLPHATLR 


HAAEELDEPQ 


3201 


liALREGTLYV 


PRLTQARQSA 


ELWPPGEPA 


WRLRMVHDGS 


LDAIiAAVACP 


3251 


EALEPIiAPGQ 


VRIAVHAAGI 


NFRDVLVALG 


MVPAYGAMGG 


EGAGWTEVG 


3301 


PEVTHVSVGD 


RVMGVFEGAF 


GPWIAEARM 


VTPVPQGWDM 


REA?IGIPAAF 


3351 


LTAWYGLVEL 


AGLKAGERVL 


VHAATGGVGM 


AAVQIARHVG 


AEVFATASPG 


3401 


KHAVLEEMGI 


DAAHRASSRD 


LAFEGTFREA 


TGGRGMDWIi 


NSLAGEFIDA 


3451 


SLRLLGDGGR 


FLEMGKTDVR 


AAEEVAAEHA 


DVSYTAYDLV 


GDAGPDRISN 


3501 


MLDKLVELFA 


SERLKPLPVR 


SWPLDKAQEA 


PRFMSQAKHT 


GKLVLEIPPA 


3551 


LDPEGTVLVT 


GGTGALGQW 


AEHLVREWGV 


RHLLIxASRRG 


PEAPGSDELA 


3601 


SKLTGLGAEV 


TIVAADVSDP 


ASWELVGKT 


DPSHPLTGW 


HAAGVLEDGV 


36-51 


VTAQTPEGIiA 


RWAAKAAAA 


ANLHEATREM 


RLGLFWFSS 


AAATLGSPGQ 


3701 


ANYAAANAYC 


DAmQHRRAV 


GQVGIiSVGWG 


LWEAPDAKPG 


VAADAKASAA 


3751 


TVGKASALSD 


GTNGSAPQDT 


TGTAPQGMTG 


GLTDTDVARM 


ARIGVKGMSN 


3801 


AHGIiAXiFDAA 


HRHGRPHLVG 


FNLDLRTIiAT 


HPIiHTRPALL 


RGIiATPTAGG 


3851 


ASRPTATAGG 


QPADLAGRLA 


ALSPSDRHHT 


LVRLIREQAA 


TVLGHHPDSL 


3901 


TTGSTFKEIiG 


FDSLTAVELR 


NRLSAATGLR 


LPAGLVFDHP 


DADILAEHLG 


3951 


AQLAPDGDTP 


AGAEATDPVL 


RDLAKLENAIi 


SSTLVEHLDA 


■DAVTARIiEAL 


4001 


LSNWKAASAA 


PGSGSTKEQL 


QVATTDQVLD 


FIDKELGV* . 





MonAV, polyketide synthase multi-enzyme MONS5, housing extension 
modules 7 and 8 Length: 4107 amino acids 
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1 


MASEEELVDY- LKRVAAELHD TRORTiRRVRD Ttv>DVP\TTy\nrr< 


tvynv TTiT^ /"* T T~i 

r^lACKb PGGlE 


51 


TPEGLWELVA AGDDAIEPFP TDRGWDLEGI YHPDPDHPRT 




101 


APDRFDSDFF GFSPREALAS SPQLRLLLET SWEALERAGI 


NPASLKGSPT 


151 


GVYVGAATTG NQTQGDPGGK ATEGYAGTAP SVLSGRLSFT 


LGLEGPAVTV 


201 


ETACSSSLVA MHIiAANALRQ GECDLALAGG VTVMSTPEVF. 


TGFSRQRGLA 


251 


PDGRCKPFAA AADGTGWGEG AGLILLERLS DARRKGHKVL AViRGSAINQ 


301 


DGASNGFTAP NGPSQRRVIR QALSSAHLST SEIDWEAHG 


TGTRLGDPIE 


351 


AEALIATYGK EREDDRPLWiTgSVKSNIGHT QAAAGVAGVI 


KMVMALQREL 


401 


LPATLNVDEP TPHVQWEGGG VRLLTEPVPW SRGERPRRAG 


ISSFGISGTN 


451 


AHWLEEAPP EEDVPGPVAA EPEGWPWV SARTEEALSE 


QARRLGEFVA 


501 


DTDPSTADVG WSLTTSRAIL EHRAWVGRD RDALTAGLAA 


LAAGEESADV 


551 


VAGVAGDVGP GPVLVFPGQG SQWGMGAQL LDESPVFAAR 


lAECEQALSA 


601 


YVDWSLSAVL RGDGSELSRV EWQPVLWAV MVSLAAVWAD 


YGVTPAAVIG 


651 


HSQGEMAAAC - VAQALSLEDA . ARWAVRSDA LRQLMGQGDM 


ASLGASSEQA 


701 


AELIGDRPGV CIAAVNGPSS TVISGPPEHV AAWADAEER 


GLRARVIDVG 


751 


YASHGPQIDQ LHDLLTDRLA DIRPATTDVA FYSTVTAERL 


TDTTALDTDY 


801 


WVTNLRQPVR FADTIDALLA DGYRLFIEAS AHPVLGLGME 


ETIEQADIPA 


851 


TWPTLRRDH GDTTQLTRAA AHAFTAGATV DWRRWFPADP 


TERTIDLPTY 


901 


AFQRRSYWLP VDGVGDVRSA GLRRVEHSLL PAALGLADGA 


LVLTGRIiAAS 


951 


GGGGGWLADH AVAGTTLVPG AALVEWALRA ADEAGCPSLE 


ELTLQAPLVL 


1001 


PGSGGLQVQV WGPADGQGG RREVRVFSRV DSDDEAAGQD EGWSCHATGV 


1051 


LSPEPGAVPD GLSGQWPPTG AEPLEISDLY EQAASAGYEY 


GPSFRGLRSV 


1101 


. WRHGHNLLAE. VELPEQAGAH DDFGIHPVLL DAALHPALLL 


DQNAPGEEQE 


1151 


PAQPALRLPF VWNGVSLWAT GAATVRVRLA PHGGGETDDS 


AGLRVTVADA 
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1201 


TGAPVLSVDS 


LiALRPADPEL 


1251 


AAGDGWAVLG 


/•^T~*\T X"T*\T^T.TTi T^ 

QDVPDWAGAD 


13 01 


TSHATPNTAA 


D VTLlDAS GRA 


1351 


T rl 1 imTt TV T^T^T%T TXT 

VTTPADDDVN 


TV TV T^T T^TTT^TV TV TV 

AAPLDVPAAA 


1401 


PDTSTDHSTA 


SGTYRTVIAR 


1451 


m fn T^^*^ T^ittv m 

TPTPETQPDT 


GSGSEAGAGS 


1501 


LVRAWSVRHL 


T *r T Tn /^/^ 1~VT~\ TV 

LLVSRQGPDA 


1551 


ADLVASVDPA 


TTT^T m/^TTTTTTV TV 

HPLTGVI HAA 


16-01 


DAATSELPLG 


LFIiMFSSAAG 


1651 


GliS I AWGLWA 


Tt^tO TV lV>fmi VTTT Tr\ 

RGSAMTRHLD 


1701 


AAHTSLWAA 


GIDVRGLNRD 


1751 


RLTGLDEAER 


RA.VVTDWRE 


1801 


VQLRNRLSAA 


SGIiRLPATLA. 


1851 


SAAATDEPVA 


IVAMACKYPG 


1901 


ERLFHPDPDH 


PGTSYADEGA 


1951 


LEASWEVLER 


AGIDPTTLKG 


2001 


AGSGSWSGR VAYTLGLEGP 


2051 


liAGGVTVMAT 


PEVFTGFSRQ 


2101 


ERLSDARRHG 


RRVLGWRGS 


2151 


GLSSVDVDW 


EGHGTGTTLG 


2201 


IGHTQAAAGV 


AGVIKMVMAM 


2251 


SVPWPEVEGR 


PRRAGVSSFG 


2301 


TETDGRLVPW 


WSARSPQAL 


2351 . 


ALFEQRAWy 


GRERAELLSG 



PCT/GBOO/02072 

LRTAGRAGSG TNGLFTVEWT ALPPADVADH 
MPRHPDMASL SAALDEGTQA PAAVFVETTA 
VAERTLHLLR DWLAEPRLAE TRLVLITHHA 
LWGLIRSAQA EHPDRFVLLD TDAKANTDPG 
ALATGEPQLA VRAGELIiAPR liARAATPTPE 
GSGPGATLDP DGTVLIAGGT GMMGGLVAEH 
PDARDLADRL VGLGATVRIV AADLTDGRAT 
GVLDDAWTA QTSDQLARVW AAKASVAANL 
VLGNAGQAGY AAANAFVDAL VGRRRATGLP 
DADLiARLRAG GVKPLIiDEQG LALLDAARAT 
DVPAILRDLA GRTRRRAAAD STVDQAAIjER 
CVAAVLGHRS AADVRTEANF KDLGFDSLTA 
FDHPTPQALA AYLGTRLSGR TATPVAPVAP 
GATSPEGLWD LVAEGVDAVG AFPTGRGWDL 
FLPDAGDFDA AFFGINPREA LAI^PQQRLL- 
TPTGTYVGVM YHDYAAGLAQ DAQLEGYSML 
AVTVDTACSS SLVSIHLAAQ ALRQGEGTLA 
RGLAPDGRCK PFAAAADGTG WGEGVGVLLL 
AVNQDGASNG LTAPlSrGPSQE RVIRQALASG 
DPIEAQALLA TYGQGRPVDR PLWLGSVKSN 
RHGWPASIiH VDVPSPHVEW DSGAVRLiAVE 
ASGTNAHVIV ESVPDGLGED SVSVSGEAPE 
RDQALRLRDA VAADSTVSVQ DVGWSLiLKTR 
IxAVLAAGEEH PAVTRSREDG VAASGAVVWL 
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2401 


FSGQGSQLVG 


MGAGLYERFP 


VPAAAPDEVC 




Avjvj JLiKiIi VV r 


2451 


GPRERLDHTM 


WAQAGLFALQ 


VGIiARLWESV 


VJ V Xv XT XJ V V JUVjrXx 




2501 


AGVFDLADAC 


RWGARARLM 


GGLPEGGAMC 


AVQATPAELA 


ADVDDSGVSV 


2551 


AAVNTPDSTV 


ISGPSGEVDR 


IAGVWRERGR 


KTKALSVSHA 


FHSALMEPML 


2601 


AEFTEAIREV 


KFTRPKVSLI 


SNVSGLEAGE 


EIASPEYWAR 


HVRQTVLFQP 


2651 


GIAQVASTAG VFVELGPGPV LTTAAQHTLD DVTDRHGPEP^VlVSS LAGER 


2701 


PEESAFVEAM 


ARLHTAGVAV 


DWSVLFAGDR 


VPGLVELPTY AFQRERPWLS 


2751 


GRSGGGDAAT 


LGLVAAGHPL 


LGAAVEFADR 


GGCLLTGRLS 


RSGVSWIiADH 


2801 


WAGAVLVPG 


AAIiVEWALRA 


GDEVGCVTVE 


ELMLQAPLW 


PEASGLRVQV 


2851 


WEEAGEDGR 


RGVQIYSRPD 


ADAVSGDDSW 


ICHATGTLTP 


QHTDAPITOGL 


2901 


AGAWPAAGAV 


PVDLAGFYER 


VADAGYAYGP 


GFQGLRAVWR 


HGQDLLAEW 


2951 


LPEAAGAHDG 


YGIHPAT.TiDA 


TLHPALLLDW 


PGEVQDDDGK 


VWLPFTWNQV 


3001 


SU^AAGAATV 


RVRLSPGEHD 


EAEREVQVLV 


ADATGTDVLS 


VGSVTLRPAD 


3051 


IRQLiQAVPGH 


DDGLFSVDWT 


PLPLSRTDVS 


QTDADGDADW 


V V JjbDCjVCjblj 


3101 


ADWSAAGGE 


APWAWAPVG 


ASAGGGLAGF 


DRREGLDGRL 


WERVIiSLVQ 


3151 


EFLAAPELAE 


SRIjLVLTRGA 


VATGGDGDGD 


VDASAAAVWG 


LVRSAQSENP 


3201 


GRFIIiliDVDM 


DVDVDVDMDV 


DVDVDVDVDV 


DGDGNGSDLD 


PDLNGRRLPH 


3251 


ATLRHAAEEL 


DEPQLALRDG 


QLLVPRLVRA 


TGGGLWAPT 


DRAWRLDKGS 


3301 


AETLESVAPV AYPGVMEPLG 


PGQVRLGIHA AGINFRDVLV 


SLGMVPGQVG 


3351 


LGGEGAGWT 


ETGPDVTHLS 


VGDRVMGVLH 


GSFGPTAVAD 


TRMVAPVPQG 


3401 


WDMRQAAAMP 


VAYLTAWYGL 


VELAGLKAGE ■ 


RVLIHAATGG 


VGMAAVQIAR 


3451 


HLGAEVFATA SAAKHWLEE 


MGIDAAHRAS 


SRDLAFEDTF 


RQATDGRGMD 


3501 


WLNSLTGEF 


IDASLRLIiGD 


GGRFLEMGKT 


DVRTPEEVAA 


EYPGVTYTVY 


3551 


DLVTDAGPDR 


lAVMMSELGE 


RFASGALDPL 


PVRSWPLDKA 


REAFRFMSQA 
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3601 


KHTGKLVLDV 


PAPLDPDGTV 


LITGGTGALG 


QWAEHLVRE 


WGVRHLLLAS 


3651 


RRGLDAPGSG 


ELTUDRLSDLG 


AEVTVAAADV 


SDPASWELV 


GKTDPSHPLT 


3701 


GWHAAGVLE 


DGIVTAQTPE 


GLARVWAAKA 


AAAANLHEAT 


REMRLGLFW 


3751 


FSSAAATLGS 


PGQANYAAAN AYCDALMQRR 


RAAGQVGLSV 


GWGLWEAPDA 


3801 


KPGVAADAKP 


DVAADAKTGV AADGTPQGMT 


GTLSGTDVAR 


MARIGVKAMT 


3851 


SAHGLALLDA 


AHRHGRPHLV 


AVDLDTRVLA 


HKPAPALPALr 


LRAFAGDQGG 


3901 


QGGGRGGGRG 


GGPARPAAAT 


TRQNVDWAAK 


LSVLTAEEQH 


RTLIiDLVRTH 


3951 


AAAVXiGHAGT 


DAVRADAAFQ 


DLGFDSLTAV 


ELRNRLSAST 


GLRLPATFIF 


4001 


RHPTPSAIAD 


ELRAQLAPAG 


ADPAAPLFGE 


LDKLETVITG 


HAHDESTRTR 


4051 


LAARLQNLLW 


RLDDTSARSD 


HAAGASDADG 


DAVENRDLES 


ASDDELPELI 


4101 


DRELPS* 











MonAVI, polyketide synthase multi-eiLzyme MONS6, housing extension 



module 9 Length: 1701 amino acids 






1 


MPGTODMPGT 


EDKLRHYLKR. 


VTADLGQTRQ 


RLRDVEERQR 


EPIAIVAMAC 


51 


RYPGGVASPE 


QLWDLVASRG 


DAIEEFPADR 


GWDVAGLYHP 


DPDHPGTTYV 


101 


REAGPLRDAA. RFDADFFGIN 


PREAIiAADPQ 


QRVLLEVSWE 


LFERAGIDPA 


151 


TLKDTLTGVY 


AGVSSQDHMS 


GSRVPPEVEG 


YATTGTLSSV 


ISGRIAYTFG 


201 


LEGPAVTLDT 


ACSASLVAIH 


LACQALRQGD 


CGLAVAGGVT 


VLSTPTAPVE 


251 


FSRQRGLxAPD 


GRCKPFAEAA 


DGTGFSEGVG 


LILLERLSDA 


RRNGHQVLGV 


301 


VRGSAVNQDG 


ASNGLTAPND 


VAQERVIRQA LTNARVTPDA 


VDAVEAHGTG 


351 


TTLGDPIEGN 


ALLATYGKBR 


PADRPLWLGS 


VKSNIGHTQA 


AAGVAGVIKM 


401 


VMAMRHGELP 


ASLHIDRPTP 


HVDWEGGGVR 


LLTDPVPWPR 


ADRPRRAGVS 


451 


SFGISGTUAH 


LIVEQAPAPP 


DTADDAPEGA 


ATPGASDGLV 


VPWWSARSP 


501 


QALRDQALRL 


RDFAGDASRA 


PLTDYGWSLL 


RSRALFEQRA 


WAGRERAEL 


551 


LAGLAALAAG 


EEHPAVTRSR 


EEAAVAASGD 


WWLFSGQGS 


QLVGMGAGLY 
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601 


ERFPVFAAAF 


DEVCGLLEGE 


LGVGSGGLRE 


WFWGPRERL 


DHTVWAQAGL 


651 


FALQVGLARL 


WESVGVRPDV 


VLGHSIGEIA 


AAHVAGVFDL 


ADACRWGAR 


701 


ARLMGGLPEG 


GAMCAVQATP AELAADVDGS 


SVSVAAVNTP 


DSTVISGPSG 


751 


EVDRIAGVWR 


ERGRKTKALS 


VSHAFHSALM 


EPMLGEFTEA 


IRGVKFRQPS 


801 


IPIiMSWSGE 


RAGEEITSPE 


YWARHVRQTV 


LFQPGVAQVA 


AEARAFVELG 


851 


PGPVLTAAAQ 


HTLDHITEPE 


GPEPWTASL 


HPDRPDDVAF ' 


SlAMADLHVA 


901 


GISVDWSAYF 


PDDPAPRTVD ; 


LPTYAFQGRR 


FWLADIAAPE 


AVSSTDGEEA 


951 


GFWAAVEGAD 


fqalcdtlhlT KDDEHRAALE 


TVFPALSAWR 


RERRERSIVD 


1001 


AWRYRVDWRR 


V ii Jj jr X IT V XT o/i. 


o X yjcUAU X 


GAWLIVAPTH 


GSGTWPQACA 


1051 


RAIiEEAGAPV 


xC± VliiiVjirrlAiJ 


ssJ\jJrlt\LJiJ V 


WRASCADDTT 


QLGGVLSLLA 


1101 


LAEAPATSSD 


J. X OXl X o X O ^KJ 


X O'OXJrioriLTXj X 


GTLTLLHGLL 


DAGVEAPIiWC 


1151 


ATRGAVSCGD 


ADPLVSPSQA 


PVWGLGRVAA 


LEHPELWGGL 


VDLPADPESL 


1201 


DASALYAVLR 


GDGGEDQVAL 


RRGAVLGRRL 


VPDATPDVAP 


GSSPDVSGGA 


1251 


AHADATSGEW 


QPHGAVLVTC 


GVGHLADQW 


RWIiAASGAEH 


WLLDTGPAN 


1301 


SRGPGRMDDL 


AAEAAEHGTE 


LTVIiRSLSEL 


TDVSVRPIRT 


VIHTSLPGEL 


1351 


APIiAEVTPDA 


LGAAVSAAAR 


LSELPGIGSV 


ETVLFFSSVT 


ASLGSREHGA 


1401 


YAAANAYLDA 


LAQRAGADAA 


SPRTVSVGWG 


IPTOLPDDGDV 


ARGAAGLSRR 


1451 


QGLPPLEPQL 


ALGALRAALD 


•GGKGHTLVAD 


lEWERFAPLF 


TLARPTRLLD 


1501 


GIPAAQRVLD 


ASSESAEASE 


NASALRRELT 


ALPVRERTGA 


LLDLVRKQVA 


1551 


AVLRYEPGQD 


VAPEKAFKDL 


GFDSLWVEL 


RNRLRAATGL 


RLPATLVYDY 


1601 


PTPRTLAAHL 


LiDRVLPDGGA 


AELPVAAHLD 


DLEAALTDLP 


ADDPRRKGLV 


1651 


RRLQTLLWKQ 


PDAMGAAGPA 


DEEEQAAPED 


LSTASADDMF 


ALIDREWGTR 


1701 


* 











MonH, probable regulatory protein Length: 981 amino acids 
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1 


VSGVERGVGS 


AGPVEQGDGL 


AGLVERAEAL 


AALRGAFDGS 


PGTGGSLWL 


51 


SGAVGTGKTA 


LLRAWADRIG 


ADADALVLTA 


TACRAERDLP 


LGVLEQLVRS 


101 


PGLPPASAER 


ALAWWDEEAS 


ATPGKTDANG 


TSANGTDANG 


TGAGQTGAGQ 


151 


AGVGQTGVGG 


EPVLAASALR 


GLCEVLRDLL 


AERPWVAVD 


DAHHADAASL 


201 


QCLLSWRRL 


RSARLHVLFT 


EYAHQKAQNA 


LLSSEFLHEP 


ALRRIRLEPL 


251 


SKAGVEALLA 


RHLDERTAQP 


LTPWHGMSA 


GHPLLVRALA 


ElDHRAAGGAG 


301 


EAYGRAVLSF 


LYRHETPVTQ 


. VARAI AALGA 


HAGPGQVGRIi 


LDVDAASVER 


351 


AVRQLTVAEV 


liHEGRLCHPif FAAAVLDGMP 


PEERRALHGR 


VADLLHEEGA 


401 


PATEVAAHLV 




VPVFQEAAQL 


ALDEDQVETG 


VDYLRAAHQR 


451 


CRGAAQRAAV 


VGAIiADAEWR 


LDPAKVLRHL 


PDPAAMAPQT 


DPAALAPHTD 


501 


PAPTAAPTAA 


PTPTPIPTTP 


PLPTHLLWHG 


RVEEGLDAIG 


TLTGPGPNPA 


551 


GAPPMNPADL 


DTPWLWGAYL 


YPGHVKERLG 


SGALSPQRST 


PPAVTPELQG 


601 


AGTLMNDLLH 


GGERDATEAA 


ERALNRYRLG 


PRTIAVQTAA 


LAAIiTYRDRP 


651 


HRAAAWCDGL 


VAQADERNSP 


TWRALFTAWR ALLHLRQGDP 


AAAEQRAETA 


701 


LALLGSKGWG 


AAIGLPLAAA 


VQAKAALGDV 


DGAAALLERP 


VPQAVFQTRT 


751 


GliHYLAARGR 


YHtiATGCHYA 


ALCDFYACGT 


RMSSWGVDLP 


ALEPWRLGAA 


801 


EAYLALGEGL 


LARQLVDGQL 


PLPTPDDGRT 


WGMTLRLRAA 


TSPAPARAEL 


851 


LDEAVAVLRE 


SGDTFELARA 


VADQAVAVRE 


GGEAERARLL 


ARKAELLARR 


901 • 


WGSAPAPATV 


PEPPERPGPA 


TPDAELTSAE 


RRVAEIiAAEG 


FTNREISRKL 


951 


CVTVSTVEQH 


LTRIYRKLDV 


RRLDLQAALG 






MonCI, flavin-dependent epoxidase Length: 496 amino acids 


1 


VTTTRPAHAV 


VLGASMAGTL 


AAHVIjARHVD 


AVTWERDAL 


PEEPQHRKGV 


51 


PQARHAHLLW 


SNGARLIEEM 


LPGTTDRLIiA 


AGARRLGFPE 


DLVTLTGQGW 


101 


QHRFPATQFA 


LVASRPLLDL 


TVRQQALGAD 


NITVRQRTEA 


VELTGSGGGS 
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151 GGRVTGVWR DLDSGRQEQL EADLVIDATG RGSRLKQWLA ALGVPALEED 
201 WDAGVAYAT RLFKAPPGAT THFPAVNIAA DDRVREPGRF GWYPIEGGR 
251 WLATLSCTRG AQLPTHEDEF IPFAENLNHP ILADLLRDAE PLTPVFGSRS 
301 GANRRLYPER LEQWPDGLLV IGDSLTAFNP lYGHGMSSAA RCATTIDREP 
351 ERSVQEGTGS ARAGTRALQK AIGAAVDDPW ILAATKDIDY VNCRVSATDP 
401 RLIGVDTEQR LRFAEAITAA SIRSPKASEI VTDVMSLNAP I^AELGSNRFL 
451 MAMRADERLP ELTAPPFLPE ELAWGLDAA TISPTPTPTP TAAVRS 

MonBn, carbon-carbon double bond isomerase Length: 141 amino acids 

1 MPDEAARKQM AVDYTSlERINA GDIEGVLDLF TDDIVFEDPV GRPPMVGKDD 

51 LRRHLELAVS CGTHEVPDPP mtsmddrfw tpttvtvqrp rpmtfrivgi 
101 veldehglgr rvqafwgvtd vtmddpagpa DTTHPEGIRA * 
MonBI, carbon-carbon double bond isomerase Length: 144 amino acids 

1 MNEFARKKRA LEHSRRINAG DLDAIIDLYA PDAVLEDPVG LPPVTGHDAL 
51 RAHYEPLLAA HLREEAAEPV AGQDATHALI QISSVMDYLP VGPLYAERGW 
101 LKAPDAPGTA RIHRTAMLVI RMDASGLIRH LKSYWGTSDL TVLG 

MonAVm, polyketide synthase multi-enzyme MONS8, housing extension 
modules 11 and 12 Length: 3754 amino acids 

1 MSNEEKLLDH LKWVTAELRQ ARQRLHDKES TEPVAIVGMA CRYPGGARSA 

51 EDLWELVRDG GDAVAGFPDD RGWDLESLYH PDPEHPATSY VRDGAFLYDA 

101 GHFDAEFFGI SPREATAMDP QQRLLLETAW EAIEHAGMNP HALKGSDTGV 

151 FTGVSAHDYL TLISQTASDV EGYIGTGNLG SWSGRISYT VGLEGPAVTV 

201 DTACSSSLVA IHLASQALRQ GECSLALAGG STVMATPGSF TEFSRQRGLA 

251 PDGRCKPFAA AADGTGWGEG AGWALELLS EARRRGHKVL AVIRGSATNQ 

301 DGTSNGLAAP NGPSQERVIR AALANARLSA EDIDAVEAHG TGTTIiGDPIE 
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351 


AQALIATYGQ 


GRPEDRPLWL GSVKSJSTIGHT 


QAAAGVAGVI 


KMVMAMRNGL 


401 


LPTSLHIDAP 


SPHVQWEQGS VRLLSEPVDW 


PAERTRRAGI 


SAFGISGTMA 


451 


HLILEEAPPE 


EDAPGPVAAE PGGWPWWS 


GRTPDALREQ 


ARRLGEPAAG 


501 


IiADASVSEVG 


WSLATTRALF DQRAWVGRD 


LAQAGASLEA 


LAAGEASADV 


551 


VAGVAGDVGP 


GPVLVFPGQG SQWVGMGAQL 


LDESPVFAAR 


lAECEQALSA 


601 


HVDWSLSDVL 


RGDGSELSRV EWQPVLWAV 


MVSLAAVWAD 


'^GITPAAVIG 


651 


HSQGEMAAAC 


VAGALSLEDA ARIVAVRSDA 


LRQLQGHGDM 


ASLSTGAEQA 


701 


AELIGDRPGV 


WAAVNGPSS TVISGPPEHV 


AAWADAEAQ 


GLRARVIDVR 


751 


YASHGPQIDQ 


LHDLLTDRLA DIQPTTTDVA 


FYSTVTAERL 


DDTTALDTAY 


801 


WVTNLRQPVR 


FADTIEALLA DGYRLFIEAS 


PHPVIiNIiGIQ 


ETIEQQAGAA 


851 


GTAVTIPTLR 


RDHGDTTQLT RAAAHAFTAG 


APVDP7RRWFP 


ADPTPRTVDL 


901 


PTYAFQHKHY 


WVEPPAAVAA VGGGHDPVEA 


RVWQAIEDLD 


IDALAGSLEI 


951 


EGQAESVGAL 


ESAIiPVLSAW RRRHREQSTV 


DSWRYQVTWK 


HLPDVPAPEL 


1001 


SGAWLLLVPA AHADHPAVIA TAQTLTAHGG 


EVRRHWDAR AMERTELAQE 


1051 


LRVLMDGAAF 


AGWNLLALiD EEPHPEHSAV 


PAGLAATTAL 


VQALADNGAD 


1101 


lAVRTLTQGA VSTSAGDALT HPVQAQVWGL 


GRVAALEYPR 


LWGGLVDLPA 


1151 


RIDHQTLARL AAALVPQDED QISIRPSGVH ARRLAHAPAN 


TVGSGLGPTRP 


1201 


DGTTLITGGT 


GGIGAVIiARW LARAGAPHLL 


LTSRRGPDAP 


GAQEIiAAELT 


1251 


ELGAAVTVTA 


CDVGDREQVR RLIDDVPAEH 


PLTAVIHAAG 


VPNYIGLGDV 


1301 


SGAELDEVLR 


PKALAAHHm ELTRELPLSA 


FVMFSSGAGV 


WGSGQQGAYG 


1351 


AANHFLDALA 


EHRRAEGLPA TSIAWGPWAE 


AGMAADQAAL 


TFFSRFGIiHP 


1401 


LSPELCVKAL 


QQALDAGETT LTVANFDWAQ 


FTSTFTAQRP 


SPLIADLPEN 


1451 


RRASAPAAQQ 


EDATEASSLQ QELTEAKPAQ 


QRQLLLQHVR 


SQAAATLGHS 


15:01 


DVDAVPATKP 


PQELGFDSLT AVELRNRLNK 


STGLTLPTTV 


VFDHPTPDAL 
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1551 


TDVLRAELSG 


DAAASADPVR 




Urit:'d.I\± VLylvJA UKx FGDVKSA 


1601 


EELWDLVAAG 


KDAMGAFPDD 


RGWDIjETTjYD 




1651 


GDFDAGFFGI 


SPREAVAMDP 


QQRLLLETAW 


J^Lt\ J. I^j\jn\j±jLJS\. JZi ± i_j JWj o UJ\\j V 


1701. 


FTGLTIFDYL 


ALVGEQPTEV 


EGYIGTGNLG 


\-. V jfio V o X V jLivjJ-iiiijir/yyi x X 


1751 


DTGCSSSLVA 


IHQAAHALRQ 


GECSLALAGG 




1801 


KDGRCKPFAA AADGTGWAEG . VGLWLERLS 


earrnghnvl\virgsainq 


1851 


DGTSNGLTAP 


NGQAQQRVIR 


QALANARLSA 


EDVDAVEAHG TGTMLGDPIE 


1901 


ASALVATYGK. ERPADRPLVfti 


GSIKSNIGHA 


QASAGVAGVI KIVIVMALRISFEQ 


1951 


LPASLHIDAP 


TPHVDWDGSG 


VRLLSEPVSW PRGERPRRAG VSAFGISGTN 


2001 


AHLILEQAPD 


APEPVTAPAE 


DAAAPAGWP 


WWSARGEEA LRAQARLLAD 


2051 




XT J-ii^ V \j W o J-i V JV 


TRSVFENRAV 


WGKDRQTLL AGLRSLAAGE 


2101 


PSPDWEGAV 


QGASGAGPVL 


VFPGQGSQWV 


GMGAQLLDES PVFAARIAEC 


2151 


ERALSAHVDW 


SLSAVLRGDG 


SELSRVEWQ 


PVLWAVMVSL ASVWADYGIT 


2201 


PAAVIGHSQG 


EMAAACYAGA 


LSLEDAARIV AVRSDALRQL MGQGDMASLG 


2251 


AGSEQVAELI 


GDRPGVCVAA 


VNGPSSTVIS 


GPPEHVAAW ADAEARGLRA 


2301 


RVIDVGYASH 


GPQIDQIJIDL 


LTERLADIRP 


TTTDVAFYST VTAERLDDTT 


2351 


TLDTDYWVTN 


LRQPVRFADT 


lEALLADGYR 


LFIEASPHPV LNLGMEETIE 


2401 


RADMPATWP 


TLRRDHGDAA 


QLTRAAAQAF 


GAGABVDWTG -WFPAVPLPRV 


2451 


VDLPTYAFQR 


ERFWLEGRRG 


LAGDPAGLGL 


ASAGHPLLGA AVELADGGSH 


2501 


LLTGRISPRD 


QAWIiAEHRVM- 


DTVLLPGSAF 


VEIALQAAVR AGCAELAELT 


2551 . 


LHTPLAFGDE 


GAGAVDVQW 


VGSVAEDGRR 


PVTVHSRPTG EGEEAVWTRH 


2601 


AAGWAPPGP 


DAGDASFGGT 


WPPPGATPVG 


EQDPYGEIiAS YGYDFGPGSQ 



2651 GLVSAWRLGD DLFAEVALPE AESGRADRYQ VHPVLLDATL HALILDAVTS 
2 701 SADTDQVLLP FSWSGLRVHA . PGAEKLRVRI ARTAPDQLAL/ TAVDGGGGGE . 
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2751 


PVLTLESLTV 


RPVAAHQIAG 


ARAADRDALF 


RLVWMEVAAR 


AEETGGGAPR 


2801 


AAVLAPVESG 


PMGGTSAGAL 


ADALSDALAA 


GPVWDTFGAL 


RDGVAAGGEA 


,2851 


PDWIAVCAA 


PGAGAGAVAD 


ADGRGGDPAG 


YARLATVSLL 


SLLKEWVDDP 


2901 


AFAATRLVW 


TRGAVAARPG 


ETAGDLAGAS 


LWGLVRSAQA 


ENPGRLTLLD 


2951 


VDGIiESSPAT 


LTGVIiASGEP 


ELALRDGRAY 


VPRLVRDDAS 


VRLVPPVGSL 


3001 


TWRLARCQEA GGGQQLSLVD 


APEAGRALEP 


HEVRVAVRAA "APGPLTAGQV 


3051 


EGAGWTEVG 


GEVGSVAVGD 


RVMGLFDAVG 


PVAVTDAALL 


MPVPAGWSWA 


3101 


QAAGSLGAYV 


SAYHVLADVV 


APRGGETLLV 


GEETGSVGRA 


VLRtALAGRW 


3151 


RVEAVDGAST 


ADDSGAERAA 


DVTLRHEGAL 


WHRAGGRPD 


EGQAWPPEP 


3201 


GRVREILAEL 


TELTEIiAEIT 


ESAEPGLPAE 


RGDSRALTPL 


DITVPTOIRQA 


3251 


PAAMAAPPSA 


GTTVFSLPPA 


FDPEGTVLVT 


GGTGALGSLT 


ARHLVERYGA 


3301 


RHLLLSSRRG 


ADAPGALELA 


ADLSALGARV 


TFAACDPGDR 


DEAAALLAAV 


3351 


PSDHPLTAVF 


HCAGTVNDAV 


VQNLTAEQVE 


EVMRVKADAA 


WHLHELTRDA 


3401 


DLSAFVLYSS 


VAGLLGGPGQ 


GSYTAANAFL 


DALARHRHDG 


GAAATSLAWG 


3451 


YWEIiASGMSG 


RLTDADRARH 


ARAGWGLGA 


DEGLALLDAA 


WAGGLPLYAP 


3501 


VRLDIiARMRR 


QAQSHPAPAL 


LRDLVRGGSK 


SGGGAVSAGA 


AALLKSLGAM 


3551 


SDPEREEALL 


DLVCTHIAAV 


LGYDAATPVN 


ATQGLRELGF 


DSLTAVEIiRN 


3601 


RLSAATGLKL 


PATFVFDHPN 


PAELAAQLRQ 


ELAPRAADPL 


ADVLAEFERI 


3651 


EDSLLSVSSK 


DGSARAEIAG 


RLRATLARLD APQDTAGEVA 


VATRTRIQDA 


3701 


SADEIFAFID 


RDLGRDGASG 


QGNGQPTGQG NGHGNGNGNG 


NGNGHGQAVE 


3751 


GQR* 











MonAVII, polyketide synthase multi-enzyme MONS7, housing extension 
module 10 Length: 1642 amino acids 

1 MAHTEEKLLE YLKRVTADLR QTERRLQDVE SAGHEPVAVI GMACRLPGGV 
51 RSPEEFWELV STGGDAVAPL PGNRNWDLDS LYDPDPESTG TSYVREGGFV 
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101 


YDAGDFDPTF 


FGIGPTEAAA 


MAPQQRLALE 


TAWEAIERAG 


IDPLSLRSSD 


151 


TSTFIGCDGL 


DYALGASEVP 


EGTAGYFTIG NSGSVTSGRV AYTLGLEGPA 


201 


VTVDTACSSS 


LVSLHIiATQA 




w XXV I'Xl^ O XT 


APT.Tr^TTQ'PT D 
rixrJLjJ-Vjri? oXiijJK 


251 


GLAPDGRCKP 


FSASSDGMGM 






T^TT .flT/'T P G a 
Xv V ijH. V X Ktjr o A 


301 


INQDGASNGL 


TAPNGPAQER 


V T "R A AT t A MA 1? 






351 


PIEAGALISA 


YGRERPEDRP 








401 


HDLLPAILHV 


DAPSPHVEWD 


GSGLRLLTDP 


VKWPRGERPR 




451 


GTNAHIilLEE 


APPEEEDVPCf 


' SVAEEPGGW 


PWWSGRTPD 




501 


EFAAGPADAS 


AADVGWSLTT 


TRSVFEHRAV 


WGRDRDALT 




551 


ASAGWAGVA 


GDVGPGPVLV 


FPGQGSQWVG 


MGAQLLDESP 


A A D T A "COT? 


601 


RALSAYVDWS 


LSAVLRGDGS 


ELSRVEWQP 


VLWAVMVSLA 


A T 7Ta7 a T^Vr*»T T*T>T5 


651 


AAVIGHSQGE 


MAAACVAGAL 


SLEDAARIVA VRSDALRRLQ 


oiivjrX^iYiAoXjo X 


701 


GAEQAAELIG 


DRPGWVAAV 


NGPSSTVI SG 


PPEHVAAWA 


DAEARGLRAR 


751 


VIDVGYASHG 


PQIDQLHDLL 


TERLADIRPA 


NTDVAFYSTV 


TAERLTDTTA 


801 


LDTDYWVTNL 


RQPVRFADTI 


EALLADGYRL 


FIEASAHPVL 


GLGMEETIEQ 


851 


ADIPATWPT 


LRRDHGDTTQ 


LTRAAAHAFT 


AGAPVDWRRW 


PPADPTPRTV 


901' 


DLPTYAFQHQ 


HYWLERSASA 


SGAVSGEQSA AEAQLWHAVE 


ELDLGLLAET 


951 


LGSEEGSEEA 


VRALEPALPV 


LKGWRRRHQD 


QATIDSWRYR 


VTWKQRSDGP 



1001 APELGGDWLL FVPADKAEHP AYRATAEALS EHGAAAVRLH PVETGRAGRQ 

1051 EIAAVDTAGL AGIVNLLALD EEPHPEHPAV PAGIiAATTAL LQALGDNGTT 

1101 APLHTVTQGA VSTGATDPLT HPLQAHVWGL GRVAALEHPR LWAGLVDLPA 

1151 RIDRHTLPRL AAALLPQDDE DQTAVRPTGI HHRRLTHAVG SIQNPVHSEA 

1201 TWRPRGTTLI TGGTGGIGAV LARWIiARQGA PRLHLTSRRG PDAPGARELA 

1251 AELDGLGTAV TITACDVSDP RQLSGLIDDM PAEHPLTAVI HAAGMTDLTA, 
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13 01 IGDLTTARLG EVLGSKSDAA WNLHELTRDL DLSAFVMFSS GAGVWGSGQQ 

1351 GAYGAAiraFL DALAEHRRAQ GLPATSIAWG PWAEAGMSAD PESLTYFKRF 

1401 GLLPIAPDLC VKALHQAVDA GDATLTVANF DWAKFTPTFT AQRPSPFLDD 

1451 LPENQREAEQ TGTAAETSAF REELAKTPAS QRLGFLVQQV RTYAAATLGR 

1501 TVEDIPAAKP FQELGFDSLT AVQLRNQLNT TTGLSLPATV IFDHPTPEAL 

1551 ATHLRGQLGD GAEVAGEGDV LAALDKWDTA FGAAEVDEApTTIRRIVGRLQV 

1601 LVSKWSPAQD GPEGTDSAHA DLEAASADDI FDLISSEFGK S* 

MonD, cytochrome P450 hydroxylase Length: 431 amino acids 

1 VGLTVGPDNA KRGIVPITDS KPAATFPDLV DPSFWARPHA ERVALFEEMR 

51 GLPRPAFIRQ NMPGVPWTFG YHALVKYADI VEVSRRPQDF SSNGATTIIG 

101 LPPELDEYYG SMINMDNPEH SRLRRIVSRS FGRNMIPEFE AVATRTARRI 

151 IDELIARGPG DFIRPVAAEM PIAVLSDMMG IPAEDHDFLF DRSISTTIVGPL 

201 DPDYVPDRAD SERAVIEASR ELGDYIAGLR aerlaapgnd litklvqvqa 

251 dgeqltrqel VSFFILLVIA gmettrnais halvlltehp eqkqlllsdf 

301 DTHAPNAVEE ILRVSTPINW MRRVATRDCD MNGHRFRRGD RIFLPYWSGN 
351 RDESVFPDPY RFDITRGTNA HVTFGAVGPH VCLGAHLARM EITVLYRELL 
401 AALPQIHAVG QPRRLDSSFl EGIKHLHCAF * 

MonRI, probable activator protein Length: 268 amino acids 

1 VRYEMLGPLR IKDGNDYATI NAQKVEIVLT vlliradrw sleqlmreiw 
■ 51 GEDLPRRATA GIiHVYISQLR kflkvpgsag npvetrapgy vlhkrdddqi 
101 DAQIFPELVD VGRSLLREKR FDEAASCFGQ ALALWRGPIL GQGGNGPGTN 
151 . GPIIDGFSTW LTEIRLECQE MLVECQLQLG RHREAVGMLY ALTAENPMCE 
201 AFYRQLMLAL YRSERQADAL kvyqsvrktl NDELGLEPGR plqelqrail ■ 
251 AGDMHIiMSPP PLiALSGR* * ' 
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MonAX, thioesterase Length: 278 amino acids 

1 LSAFLAKGKI LSAFPPPDMS DP^iTIRRFRPR PEAWRLVCF PHAGGSASYY 
51 HPLAQSPTLP TDSEVLAVQY PGRQDRRRER LLDDIGELAD LITDALGPFD 
101 DRPLAFFGHS MGAVIiAYEVA QRLRERTGKQ PCRLFVSGRR APSRFRRGTV 
151 HLLDDTELAA ELRRAGGTDP RFLDDEELLA EIIPWRNDY RAVEL YRWNP 
2 01 SPPLSCPITA LVGDRDPQAP LDEVEAWQQH TEGPFDLKVF AGGHFYLNTH 
251 QQGVTEVISK ALADSAQQRA TARGNAR* 

ORF29, a homologue of CapK involved in cell wall biosynthesis Length: 428 
amino acids 



1 


liADLVAHARS 


ASPYYRELYH GLPERIEDPT 


LLPVTDKKQL 


MDHFDDWPTD 


51 


RDITFEKVRA 


FTDDPELIGR RFLGRYLVAT 


TSGTSGRRGL 


FVLDDRYMNV 


101 


SSAVSSRVIiA 


SWLGPLGIAR AWHGGRFAQ 


LVATEGHYVG 


FAGYSRLRQD 


151 


GEARS KLVRA 


FSVHEPMSRL VAELNEYRPA 


FVIGYASTIM 


LFTAEQEAGR 


201 


LHIDPVLVEP 


AGETMTESDT DRIAAAFGAK 


VRTiyiYSATEC 


TYLSHGCAEG 


251 


WYHVlvIDDWAV 


LEPVDADHRP TPPGEFSHTT 


LISNLiANRVQ 


PFLRYDLGDS 


301 


VMLRPDPCPC 


GTPSPAIRVQ GRSGDILTFP 


SGRGDDVSLA 


PLAPSSLFDR 


351 


MPGVELFQIE QTAPSTLRVR WQAPGADAD 


HVWQRAHDGL 


THLLADNKLD 


401 


MVTVERGEEP 


PRQASGGKYR TIIPLAA* 






LipB, lipase B Length: 338 amino acids 






1 


VKVPVEVTVR 


LSSWLGGLVA AVLAATVLPA 


SAASAADVSS 


PPLEIPAAEL 


51 


AKALHCGTEL 


GDLRDAGDKP TVLFVPGTGL 


KGEENYAWNY 


MAE^iKKKGYQ 


101 


SCWVDSPGRG 


LRDMQESVEY WYATRAIQE 


ATGRKVDLVG 


HSQGGLLTAW 


151 


ALRFWPDLPG 


KVDDMVTLGS. PFQGTRIiASP 


CRPIAEVAGC 


PASVLQFARD 


201 


SNWSKALGAD 


GTPMPAGPSY TTIYSYADES 


WADGEAPSL 


PGAHRIGVQD 
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251 ICPGRPWPTH lAMWDQVSY DLVADAIEHP GPADTSRIDR AHCAKPVMPL 
301 NSQEAVDALP GLLNFPIELL IHSQPWVDEE PPLRPYAR 

ORF31, putative ion pump Length: 309 amino acids 

1 MGHDHGPSAG AAGGTLSGTY RKRLLWTIGI SGSITVIQW GALLSGSLAL 

51 LADAAHSLTD AVGVSIiALGA ITLAQRAPTP RRTFGFCRVE IFSAVLNALL 

101 LWIFAWVLW SAIGRFSEPV EVKGGLMFW ALGGLAANLV GLWLLRDAKE 

151 KSIiNLRGAYL EVLGDALGSV AVIVGGLVIL LTGWQAADPI ASIVIGLLIV 

201 PRAYGLLRDS LHVLLEATPQ DVDLGEVRRH LLEERGWAV HDLHGWTVTS 

251 GMPVIiTAHW VTEEAIASGY GELLGRLQRC VGGHFDVAHS TIQLEPEGHV 

3 01 EEDGALHT* 

ORF32, hypothetical membrane protein Length: 364 amino acids 

1 MTRAIiTLHDW IVAGIAWAG WAGLLLRAL LRWLGERASK TRWSGDDVIV 
51 DALRTLVPCA AITAGLAAAA GALPLTPRTG RNVTMTLTAL LILAATLTAA 
101 RIVTGLVKAV AQSRSGVAGS ATIFVNITRV WIiAMGFLIV LQTLGISIAP 
151 LLTALGVGGL AVALALQDTL ANLFAGVHIL AAKTVQPGDY IQLSSGEEGY 
201 WDINWRNTT VRQLSNNLVI IPNAKLAGTN MnSTYSRPEQE LSIMVQVGVS 
251 YDSDLEQVEK VTTEWDEVM AEITGAVPDH EAAIRFHTFG DSRISFTVIL 
3 01 GVGEFSDQYR IKHEFIKRLH QRYRAEGIRV PAPVRTVRVQ QGELPPPLGI 
3 51 PHQRDTSTQA RliH* 

AmtA, glycine amidinotransferase (partial coding sequence) 
Length: 131 amino acids 

1 MSPVNSHlsrEW DPLEEIIVGR LEGATIPSSH PWACNIPTW AARLQGIaAAG , 
51 FEYPQRLIEP AQQELDQFIA LLQSLDVTVR RPAAVDHKHR FGTPDWQSRG 
101 FCNSCPRDSM LWGDEIIET PMAWPCRCFE T 
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CLAIMS: 

1. A DNA sequence which is (a) at least part of 
the sequence set out in the appended sequence listing; oj 
(b) a variant of a sequence (a) which encodes a 
polypeptide which is at least 80%, preferably at least 
90%, identical with tiie corresponding peptide as set out 
in table II; provided that it is not a sequence encoding 
all or part of the polypeptide consisting of amino acids 
1-920 encoded by mon AI as set out in table .II. 

2. A DNA. sequence according to.. claim 1 comprising 
the complete monensin gene cluster or a variant thereof. 

3. - A DNA sequence encoding at. least . part of at least 
one polypeptide which is necessary for the biosynthesis 
of monensin, and which is encoded by DNA included in the 
appended sequence listing or an allele, mutation or other 
variant thereof; provided that said polypeptide is not 
all or part of amino acids 1-920 encoded by mon AI as set 
out in table 11. 

4. A DNA sequence according to claim 3 which 
comprises at least part of one or more of the following 
genes: mon BI, mon BII, mon CI, mon CII, mon H, mon RI, 
mon RII, mon mon AIX and mon AX. 
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5, A DNA sequence according to claim 4 comprising 

all of the genes listed therein or an allele, mutation or 
other variant thereof. 



5 6 • A DNA sequence according to claim 3 encoding at 

least part of one or more of the polypeptides set out 
below, said polypeptide having the amino acid sequence as 
set out in the appended sequence data or being a variant 
thereof having the specified activity: 
10 peptide activity 



mon 


CII 


epoxyhydrolase/cyclase 


mon 


E 


S-adenosylmethionine-dependent methyltransf erase 


ition 


T 


monensin resistance gene 


mon 


RII 


repressor ] 


protein 


mon 


AIX ■ 


thioesterase 


mon 


AJ 


polyketide 


synthase multienzyme 


mon 


All 


polyketide 


synthase multienzyme 


mon 


AIII 


polyketide 


synthase multienzyme 


mon 


AIV 


polyketide 


synthase multienzyme 


mon 


AV 


polyketide 


synthase multienzyme 


mon 


AVI 


polyketide 


synthase multienzyme 


mon 


AVII 


polyketide 


synthase multienzyme 


mon 


AVIII 


polyketide 


synthase multienzyme 


mon 


H 


regulatory protein 


mon 


CI 


flavin-dependent epoxidase 


mon 


BII 


carbon-carbon double bond isomerase 
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mon BI 



carbon-carbon double bond isomerase 



mon D 



cytochrome P4 50 hydroxylase 



mon RI 



activator protein 



mon AX 



thioesterase 



10 



15 



20 



7 . A DNA sequence according to claim 6 encoding a 
single enzyme activity^ of a multienzyme encoded by any of 
mon Al-mon AVIII or a variant or part thereof, 

8 . A DNA sequence according to any preceding claim 
encoding any one or more of the domains as set out in 
Table I or a variant or part thereof. 

9 . A DNA sequence according to any preceding claim 
which has a length of at least 30, preferably at least 60, 
bases. 

10. A recombinant cloning or expression vector 
comprising a DNA sequence according to any preceding 
claim. 

11. A transformant host cell which has been 

. transformed to contain a DNA sequence according to any of 
claims 1-9 and which is capable of* expressing a 
corresponding polypeptide. 
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12. A hybridisation probe which is a DNA sequence 
according to any of claims 1-9. 



13, Use. of a probe according to. claim 12 to detect a 
PKS cluster, optionally followed by isolation of the 
detected cluster. 



14- Use of a probe 
encodes at least part of 
function to detect genes 
analogous function . 



according to claim 12 which 
a polypeptide having a known 
encoding polypeptides having 



15, Use according to claim 14 wherein the 
polypeptide of known function is AT- of module 5 or the 

15 regulatory protein encoded by mon RI, 

16. A hybridization probe comprising a 
polynucleotide which binds specifically to a region of the 
monensin gene cluster selected from mon BI, mon BIT, mon 

20 CI, m<Dn CII, mon H, mon RI, mon RII, mon T, mon AIX and 

mon AX, 



17, Use of a probe according to claim 16 in a method 
of detecting the presence of a gene cluster which governs 
25- the synthesis of a polyether, and optionally isolating a 
gene cluster det-ected thereby. 
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18. Use of a probe according to claim 12 which 
comprise a polynucleotide which binds specifically to a 
gene responsible for levels of activity of the monensin 
gene cluster, in a method of detecting an analogous gene 
5 in a gene cluster for biosynthesis of another polyketide^ 

optionally followed by a step of manipulating the gene 
detected thereby to alter the level of expression of said 
other polyketide. 

10 19. Use according to claim 18 wherein the gene is a 

regulatory gene, resistance gene or thioesterase gene, 

20. Use of the man RI gene or variant and a monensin 
promoter to control expression of a heterologous gene in 

15 S. cinnamonensxs, 

21. Use of a portion of the monensin gene cluster 
encoding a polypeptide having chain terminating activity, 
preferably comprising at least one of mon AIX and mon AX 

20 or a mutant, allele or other variant thereof encoding a 

polypeptide having chain terminating activity, to effect 
chain release of a peptide other than monensin. 

22. Use of a portion of the monensin gene cluster 
25 encoding a polypeptide having carbon-carbon double bond 

isomerase activity, preferably comprising at least one of 
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man BI and mon BII or a mutant^ allele or other variant 
thereof having isomerase activity to provide a desired 
stereochemical outcome in the synthesis of a polyketide 
other than monensin. 

5 

23. A polypeptide encoded by a portion of the 
monensin gene clustei^ preferably comprising at least one 
of mon BI and mon BII or a mutant, allele or other variant 
thereof, having carbon-carbon double bond isomerase 
10 activity, or at least one of mon AIX and mon AX or a 

mutant, allele or other variant thereof having chain 
terminating activity. 



24. An epoxidase enzyme encoded by mon CI or a 

15 derivative or variant thereof having epoxidase activity . 

25. A cyclase enzyme encoded by mon CII or a 
derivative or variant thereof having cyclase activity . 

20 26. ' Use of a portion of the monensin gene cluster 

encoding a peptide having epoxidase or cyclase activity, 
preferably comprising mon CI or mon CII or a mutant, 
allele or otiier variant thereof encoding a polypeptide 
having epoxidase or cyclase activity to provide a said 

25 activity in the biosynthesis of a polypeptide other than 

monensin, 
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27. A process for producing a polyketide containing 
a desired starter unit comprising providing a PKS gene 
having a loading module and a plurality of extension 
modules, wherein the loading module includes a KS^ domain 
5 derived from a KS domain of a monensin extension module. 



28- A process according to claim 27 wherein the KS^ 
domain is derived from KS of module 5 of monensin. 

10 29. A process according to claim 2T or claim 28 

wherein the starter unit also includes an ATg domain 
derived from an AT domain which is naturally associated 
with the KS domain. 

15 30. A DNA sequence comprising DNA encoding at least 

one PKS loading module and a plurality of PKS extension 
modules, and which can be expressed to produce a 
polyketide; wherein at least one of said modules or at 
least one domain thereof is a monensin module or domain or 

20 a variant thereof and is contiguous to a further one of 

said modules or a domain to which it is not naturally - . 
contiguous; provided that the sequence is not an ery 
loading module, the first and second extension modules of 
the erv PKS and the erv chain-terminating thioesterase . in 

25 which the DNA encoding AT of the first extension module 

has been substituted by DNA encoding an ethyl malonyl-CoA 

-106- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 

PCT/GBOO/02072 

AT from the monensin gene cluster. 



31. A DNA sequence according to claim 30 wherein 
said further module or domain is also a monensin module 
5 domain or variant thereof. 



32. A DNA sequence according to claim 30 wherein 
said further module or domain is a module or domain of a 
PKS of a polyketide other than monensin or a variant 
10 thereof. 



33. A DNA sequence according to claim 30, 31 or 32 
wherein said loading module is adapted to load a starter 

. unit other than a starter unit normally received by the 
15 adjacent extension module. 

34. A DNA sequence according to claim- 33 wherein 
said loading module is derived from a monensin extension 
module or variant thereof. 

20 

35. A polyketide synthase encoded by the DNA 
sequence of any of claims 30-34. 

3€- A polyketide compound as produced by. a synthase 
25 according to claim 35. 
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37. A vector containing a DNA sequence of any of 
claims 30-34. 



38. A transformant cell transformed to contain a DNA 
5 sequence of any of claims 30-34. 

39. A method of^ producing 5. cinnamonensls capable 
of enhanced levels of production of monensin comprising 
engineering it to overexpress the man RI gene. 

LO 

40. A method according to claim 39 wherein said 
engineering comprises introducing at least one additional 
copy of the inon RI gene as shown in the appended sequence 
data or a variant thereof. 

.5 

41. 5. cinnamonensis containing multiple copies of 
the mon RI gene as shown in the appended sequence data 
and/or variant (s) thereof. 

0 42. A method of producing monensin comprising 

culturing the organism of claim 41 and/or an organism 
produced by the method of claim 39 or claim 40. 

43. A process for expressing a gene heterologous to 
5 5. cinnamonensis comprising transforming 5. cinnamonensis 

with DNA Bncodilng a heterologous gene and expressing said 
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gene under control of the activator gene man Ri or 
actII/orf4. 



44. A process according to' claim 43 wherein said 
5 heterologous gene is a PKS gene. 



45, 13-Propyl eyythromycin 



A. 
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SEQUENCE LISTIKTG 

1 GATCAGCGCG GTGGCGTCGT CGGCGTCCAG CTCGTTCTGC GTGGCGGACG 

51 GCAGCGCGAT GTCGGCAGGC ACCTCCCAGA CCCGGCGGCC CGGCACGAAG 

101 CGGGCCGAGG CGCCGCGGCG CTGGGC6TAG GTGTCCACGC GGGCGCGTTC 

151 GACCTCCTTG ACCTGCTTGA GGAGGTCCAG GTCGATGCCC TTCTCGTCGA 

201 CGACGTAACC GGAGGAGTCC GAACACGTCA CGGCGTTGGC GCCCAGGCJCG 

• 251 • GCGAGCTTCT GGATGGTGTA GATGGCGACG TTCCCGGAGC CGGACACGAC 

301 CGCCGTCCG6 CCTTCGAGGG tCTCGCCGCG CTCACGCAGC ATCGCCGCCG 

351 CGAAGAGGAC GTTGCCGTAG CCGGTCGCCT CCGGACGGAT CAGGGAGCCG 

401 CCCCAGTTGC GGCCCTTGCC GGTGAGGACG CCCGCCTCCC AGCGGTTGGT 

451 GATGCGCCGG TACTGACCGA ACAGATAGCC GATCTCCCGG CCGCCGACGC 

501 CGATGTCGCC CGCGGGCACG TCCGTGTGTT CGCCGATGTG CCGGTACAGC 

551 TCCGTCATGA ACGACTGGCA GAAACGCATG ACTTCCGCGT CGCTGCGGCC 

. €01 GCGCGGGTCG AAGTCGCTGC GGCCCTTGCC GCCGCCGATG CCGAGGCCCG 

651 TCAGCGCGTT CTTGAAGATC TGCTCGAAGC CCAGGAACTT GATGACGCCG 

701 AGGTTCACCG ACGGGTGGAA GCGCAGGGCG CCCTTGTAiCG GGCCGAGGGC 

751 GCTGTTGAAC TCCACCCGGA AGCCGCGGTT GACCCGCACG CGACCGTGGT 

801 CGTCCTGCCA CGGCACCCGG AAGACGATCT GGCGCTCCGG TTCGCACAGG 

851 CGCTCGATCA GGCCGGCTTC GGCGTACTCG GGGCGAGCCG CGATGACCGG 

901 CGCCAGGGTC TCGAGGACCT CGCGGGCGGC CTGGTGGAAC TCCGGCTGGG 

951 CCGGGTTGCG GTGTTCGATC TCGGTGAGCA GCTGGGAGAG TGCTGTCTTC 

1001 TGCGAGAGAG CTGTCTTCGT GTCGGGTCGC GTGGTCAAAG GAGCCCTTTC 

1051 TGGCACGGCC GGGGTAGGCG CTCGGCGCCG TTGCCGTGCG CAGGGAGACG 

1101 CTCGAGCCGC AAGTATGAGG CGCATGTAAA CACAGCGACC AGCCCCCGGG 

1151 TCCAGGGAGT GACCACCATO CGAGACCGGG CCACCGGTAG GGCCACCGGT 

1201 CCGGCCTGC-G GACCCCGTGT CACTTCGGGC TCGCGGCCAG GGGTGCCGCC 
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1251 CGGCGGACCG AATCGGCGGA GGCGGCCAGC AGTGGCATGC GGACGGCCGG 

1301 GCTGGGAATG CGGTTCTGGG CGTGCAGCAC - TCCCTTGATC ACCGTGGGGT 

1351 TCGGTTCGGT GAAGAGGGCG GCGGAAAGGC GGGCGAGGTC GGCTCCGAGA 

1401 GCGCGGGCGG GTGCGGCGGA GCCGCGTCGC CACAGCGCGA TCATCTCGGC 

1451 GTAGTCGGCG GTACGCAGAT TGGCCGACGC CACGATTCCG CCGTGGGCGC 

1501 CCGCAGCGAC CAGCGGCGAG AGGACGATGT CGTCACCGCC GAGCACGGCG 

1551 2!UiGCCGGGCA GGGGCGAGTC GAGCAACTCC ATGGTGGTCG GGTCGATCGA 

1601 GCCGGTCGCG TGCTTGATGC CGACGACCTC CGGCAGGCGG CCGAGTGCGG 

1651 TGATCGTGCC CGCGCCGAGC GTCTGCCCGG TGCGGTAAGG GATGTCGTAC 

1701 ACGACCAGGG GGAGGCCGCC GTGCTCGGCC AGCGCGGCGA AATGAGCCAG 

1751 GGTCCCCGCT TCCCCGGGGC GGATGTAGGG CGGCGCGGGG ACCAGCGCGG 

1801 CGGCGACGTC ACCCCGGGCC GCCAGCTCTC GCAGGGCCGT GUTGGCGGTG 

1851 GCGGTGTCGT TGGTGCCCAC CCCGACGATG AGCGGTGCCC CGTGTGCCCG 

1901 GCACGCGGCC GAGCAGACGC GGATCACCGT CTCTCTCTCC TCGGCGGTCA 

- 1951 GTGTGGCGGC CTCGGCGGTC GTACCGAGGG CGACGAGCCC GGAGGCGCCG 

2001 GCCGACAGCG CCTCGTCGGC GAGTCGGGCC AGCGCCTCGG GGGCCAGGCG 

2051 CAGATCGTCG GTGAACGGAG TTACCAGGGG GACGTACAGG CCGTTGAAGA 

2101 GCGGTTCGGT GGTCGGTTCG AGGCTCGATG CGAGGGTCAT GCTCTTACCC 

2151 TGGCCCACGC CACTCGGTAG ATCCATTTCA GATTCCTGCC GTCACACCTA 

2201 AGCTGAACTT ATGCTCGATG TCCGTCGCCT CCATCTGCTC CGCGAACTCG 

2251 ACCGGCGGGG CACCATCGCC GCCGTGGCCG AAGCGCTGAC CTTCACCGCG 

2301 TCCGCCGTCT CCCAGCAGCT CGGCGTGCTG GAGAGGGAGG CGGGCGTGCC 

2351 GCTGTTGGAA CGCAGCGGCA GGCGCGTGGT CCTCACGCCC GCAGGACGCT 

2401 CCCTCGTCGC ACACGCCGAC GCGGTGCTGA ACCGTCTCGA ACAGGCGGTC 

2451 XSCCGAGCT-GG <:K3GGCGCACG GGACGGCATC GGCGGGCOSC TGCGCATCGG . 

2501 GACGTTCCCT TCCGGCGGCC ACACCATCGT CCCGGGCGCG CTGGCCGAAC 
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2551 TGGCCTCTCG TCACCCCGCG TTGGAGCCGA TGGTGCGGGA GATCGACTCC 

2601 GCGCGCGTCT CCGACGGTCT GCGGGCCGGT GAGCTGGACG TGGCCCTCGT 

2651 ACACGACTAC GACTTCGTAC CCGCGACGCC GGACACGACC GTGGACGAGG 

2701 TGCCTCTGCT CGAAGAGCCG ATGTACCTCG. TCACCCATGC CGCGGACACT 

2751 GCCACGGACT CCGGCTCCGG GAGCACACTG GCAGCGCTGC TCGGGCCCTG 

2801 TGCCGAGGTT CCGTGGATCA CGGCGCGGGA CGGCACGACC GGTCACGCGA 

2851 TGGCTGTACG CGCCTGTCAG GCCGCCGGGT TCCAGCCCAG GATCCGCCAC 

2901 CAGGTCAACG ACTTCCGCAC dOTGCTGGCT CTGGTCGCCG CCGGGCAGGG 

2951 GGCCGGGTTC GTGCCGCGGA TGGCCGCCGA GCCGAGCCCC GGGGGCGTGG 

3 001 TGCTCACGAA GCTGCCGCTG TTCCGTCGCT CGAAGGTCGC OTTCCGTGCG 

3051 GGCGGCGGTG CCCATCCGGC GATCGCCGCT TTCGTGGCCG CGGCGACGAC 

3101 GGCGGTCGAA CGCATGGCGG GTTCACGAGG CCCGGCCGGC GGCTCTGAGT 

3151 GAACCGGCCG ACCGTGGGAA T6TGTGTGCG CTGGGCCGCA CCATTCGTGG 

3201 CCTGGTGACG TCCTGGCGAC GTCCTGACGT CCTGATGTCC GAACGAGAAG 

3251 GCGATTTTCC GCGATGGCCG ATGACGCGTA CCTGTTCCTC CTCCCCGACC 

3301 GGCACCCCCG ACTGGGAGCG GCCCTCGGCG CCGTCGGTGC CTTGGAATGC 

3351 ACGGAAACCC CTGCGGTGCA CGCCTGGTTG CAGGCTCATG AGGCCTCCGT 

3401 GTCCTCGGAA CAGGTCAGGA TTCTGCCCGC CGATGCCGAG ACACTCATCC 

3451 CGAAGGACGC CGAGCGGCTG CCGGTGCCGT TGAGCGAGGA GGAGGCGCTC 

3501 AAGGTCGAGC AGGAGTGCGC GCCCCAGACC GTCACGGACA TGGAGAGCGA 

3551 ACTGCTCGCG TTCCGGGAGA CGACCCAGGA CTGGCAGGCC CTCGTGCACC 

3601 GGGCCCTGAC. CGCGGGCATC CCCGCGCAGC GCATCGCCCG GCTGACCGGA 

3.651 CTCGACCCGG AGGAGATCGG CCGCCTGTAG GGGCTAGCGG CCGCCCAGTG 

3701 CGGACACCAG GATGGC6ACC GTGACGGTGT TGAAGACGAA GGCGATGACC 

3751 GTGTTCGCCG <:GACGGTCCG •TC<3CATGTCG <:GTGAGGTGA CGTGGACATC 

3 801 GGTGGTGCCG AACGTCGTCA TCGCGGCCAG. GGCGAAATAG ACGTAGTCGG 
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3851 CCCAGGCGGG ACTCCGCTCC CCGGGGAATT CCAGTGCCCG CTCGTTCTCC 

3901 ACGAGGTTGT CGGCCTGGAA GGTGACGGCG AAGGCCACGA CCACGCAGAT 

3951 CCAGGCGGCG ACGACCAGGG CGAGCGCCAC CAGGGTGCGG GGGAGCGCGG 

4001 AGAAGGTGGT GCTGA6GTGG CCGGGAAGCC ACAGCACCGC CACCACCAGC 

4051 GCGGCAGCCG CGATGAAGAG CGAACCCCCG GGGCCGGGCG CGGTTCCGAG 

4101 GACGTAACGC TGCAGGAATG TGCCGCGGGC TTCGCGCCGC GCCCAGGAGC 

4151 GGACCTGCTC CGGAGCGACG CTCACGAAGA CGGTCATGGT GATGGCGAGG 

4201 TAGGGCAGCA GGTAGGCGAA GAAGACGAGC ACGCCGACAT CCGCTGCCGA 

4251 AATCCGCACC ACGGCGTCGA TGGGGAGGAC CACTGCCGCG CACGCCGCGA 

4301 CGGCCAGGCT CACCGCCGAC CGGCGCCGTT CGGAAAGCCA GCGATGCACG 

4351 GACGAGCCTC TCTGGTCGGG CGTCGGGCCT CGTGTGATCG TGACCGGCTC 

4401 CGCGCCCGCC GAAAGOGCGG TGCGATCTCC TGCCCTCGAA CGAGCGAAAC 

4451 GCTTGCGCCG GAAAGCCTCC CTGCTGATGC CGACGG,COGC GGCAGTGGCT 

4501 . GCGGATGCGG ATCGTGCGCT GTGCCCTGAC ' CCTGGATGGG GGGAGGAACG 

4551 CAGAGAGGCA GGTGCGCCCA TGACGGTCAT GGACAAGCTC AAGCAGATGC 

, 4601 TCAAGGGGCA CGAGGACAAG GCCGGCCAGG GAATCGACAA GGCGGGCGAC 

4651 TTCGTCGACG GGAAGACGCA GGGCAAGTAC AGCGGtCAAG TCGACACGGC 

4701 CCAGGACAAG CTCCGGGACC AGTTCGGCTC GGATCAGCAG GAGCCTCCGC 

4751 AGAGGTAGGC AGCGTCAGGG CGGAATCGGT CCGGGCGACC GCTGACCGCT 

4801 GATGCAGATG CCGCAGACGT CGGCCCCGCA CTCCTCCGGG TAAATCGGAG 

4851 CGTAGGCGGG GCCGACGTGT GCGCGTGCGG CCTCGTCTCT GCCGCCCCTC 

4901 TCCGCCCCGT CTCTGGCCCC TTGGTGCCAG. TCTGACGGGA AAATGGCACC 

4951 ACTTGGTGCC ACGCATGTGC CATGATGGCG TCATCGAGAG CGCGCTGCCC 

5001 GGAGTCGCGG GCAGGAAGGG CGCGTTCCGC GGAGTCGGCG GTCGGAGGGG 

5051 TTGCATCATG GGGACAGCAC AGAGCCAGGA GCAGGCCGCC GCGCGCGGTG 

5101 CCTGCGCCGC CTTCGTCCGC TTCGTGCTCT GGGGTGGCGG AGTGGGCCTC 
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5151 GCCTCCAGCT TCGCCGTGGT CGCCCTCGCC TCCTGGGTTC CCTGGGCGCT 

5201 GGCCAACGCC CTGGTCGCCG TGGTCTCCAC .CGTCGTCGCC ACCGAGCTCC 

5251 ACGCCCGCTT CACCTTCGGT GCGGGCGGGC GCGCGACCTG GCGGCAGGAC 

53 01 GCGCAGTCGG CCGGGTCCGC GGCGGCCGCG TACGCGGTGA CCTGCGTGGC 

5351 GATGTTC6TC CTGCAGCAGC TGGTGGCGGC GCCCGGCGCG GTGCTCGAGC 

5401 AGGTCGTGTA CCTGTCGGCC TCCGCGCTCG CCGGTGTCGC GCGGTTCCJTG 

5451 GTGCTGCGCC TCGTCGTCTT CGCCCGGAAC CGCTCGCTGC CCGCCGCGGC 

5501 CGCCGTGCGC ACCGCGCGTC 'CCGTGCGTCG CGTGCCGGCG CCCGTGCCCG 

5551 CGACCGTGGC CCACGCCGCA TCGCGCCCGG CCGGCCCCGC GGCGCTCTGC 

5601 CCCGCCGCAT GACTCCGTGC CCGCATGTTT GTGCCCCCGG TGCTCCGTGC 

5651 GTCCGGGGGC GGGGTGGGCG TCGTGCCCGG GTGGTCCAGG GGTCACGCGG 

5701 TGGTGTGTGC CAGTTCCTGG CCGAGGTGGT GGGCGAGCTG TGCGGGCGTG 

5751 GGGTTCTCGA GGATGGCGAC CATCGCGATC tccatcccgg tcagcgtcat 

5-801 CAGTGTCTTG GTGAGCTCAA GGGCGGTGAG GGAGTTGAGA CCGTTCTCGA 

5851 GGAAGTTGCT GTCGTCGCTG AGGGTGGTGT TCAGAAGGGT GCCGGCCTGG 

5901 GTGCGGATGG TGTCGGTGAG GAGCTTCTCG CGCTCCTCGG GGGTGGCCGC 

5951 GGCGAGCTGC TTCTCCAGCT CGGTGGCGTC CTGGCCGQAG GTGTGGTCGG 

6001 TGCTGGTCAT GACTGCTCCT GTGTGAGTGA GGTGTTGGCG GGGGTCACAC 

6051 CGCGGCGTGC GCGGTGTGGT CGTGCAGCCA GTAACGCGTG GCCTGGAAGG 

6101 AGTACGTCGG GAGGTCGATG <3TCCGGGGGT GGGGGGTGCG . CCGGACGAGA 

6151 GGGGTCCAGT CGACGGTGCC GCCCGTGGTG TGAAGCCGCG CGAGGGCGGT 

6201 CAACAGGGCG CTTACGGCGG AGGTTTGCGT GCCTTCCGGT GAGAGCGCGC 

6251 CCAGGTGGAG AAGCGTGTGG GTCTCGGGGG TGGGGGGTGC GGTGGGGGCC 

6301 GGCGAGGTGA GGTGGTGGTC CCAGTAGTCG GCGGAGGCGA TGGGGGTGTC 

€351 GGCCGGGGCA GTOCTGGTGA GCGTGAGCGT OGCGCGTTGG AACGTCAGCT 

6401 <3CTTCAGCAC <3GGCTCGTAG GOGTCGGGCG GAGCCGGTTG TTCACCCTCG 
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6451 GCGGCCTGGG CGGCAGCGGC GTGGGCGGCG GCCAGGCGGC ACGCGTCGTC 

6501 GAGGGTCAGG ATTCCCGCGG CGTACGCGGC GGCGATGTGG CCGACACCGT 

6551 CGCCGGTGAG GGTGTGGGGG CGTACCCCCG TTTCCAGGAG CAGCCGCGCG 

6601 AGCGCGGTGT GGACCGCGAA GCGCGCCAGT TCGGAGTGGG GAGTGGGGAG 

6651 GGGAGTCGGC AGATGGGTGT CGAGGAGCGC GCGCGCTTCG TCGAAGGCGG 

6701 ACGCGAAGAG CGGGAACGCC GAGTGGAACT CGGCACCTCC GAAAGCCd&G 

6751 CCGAATGTAG CGCCGAATGT CGCGCCGGGT TTGGCTCCGG GTGCCGCCCC 

6801 CGTCGTCACC CCGTCGGCCG GGCGGCCGTC GAAGTGCCAG GCGATCTTCT 

6851 TCGGGCCGGC. CCCGGGCGTG GACCTGACCA GGTCCGGGTG GTCCTCTCCG 

6901 GCGGCCAGGG CGCGGGCGGC GGCGAGGAGT TCGGTGTGGT CGGTGCCGGT 

6951 GAGGACGGCG CGGTGTTCCA GGGGGCTGQG OGTGGCGGCG AGCGAGTAGG 

7001 CGACCTCGGC GGGGGAGGGC GCGGGGTCGG TGGCCGCCAG GTGGGTGACG 

70 51 AGGGCCTTCG CCTGTGCCCG CAGGGCCTCG GGTGTACGAG CGGACAGGCT 

7101 CCAGGCCACCGGGAGTTCCG GGGCAACGGG CGACGTCTGG TCGCGGGCGG 

7151 CATCCGGCAC CGGAGCCTCG TCCACCGGCG GCTCTTCGAG GATGAGGTGC 

7201 GCGTTCGTGC CGGACGTGGC GAAGGCGGAG ATGCCGACCC <3GCGGGGCTC 

7251 .CTCGCGGCGG GGCCAGTCGA CCGCCTCGGT GAGCAGCCQT ACCGCGCCCT 

7301 TCTTCCAGGC. GGCGAGGGGC GTCGGGCGGT CGACGTGGAG GGTCGGCGGC ' 

7351 AGGGTGCCGT GCCGGAACGC CTGGACCATC TTGATGAGCG CGGCCGCACC 

7401 CGCGGCCCCC TGCGTGTGCC CCGTGTTGGA CTTGACGGAG CCGAGGCACA 

7451 GGGGCCGGTC GGGGGAGCGG TGGGCGCCGT AGGTGGCGAG GAGGGCCTGG 

7501 ACCTCGATGG CGTCGCCGAT GGGGGTGCCC GTCCCGTGCG CCTCGACGGC 

7551 <3TCGATCTGG TCCGGGGTGA GCCCGGCGTC GGCGAGGGCG GCGCGGATCA 

7601 CATGCTGCTG GGAGGGGCCG TTGGGGGCGG CGAGGCCGTA TCCGGCGCCG 

7651 TCCTGGTTGA CCGGGGAGCC -GCGGATGACG GGGAGCACCG GGTGGGCGTT 

7701 CTTCCTGGCC TC<3CGGAGCC GCTCAAGCAG GACGAtSGCCG ACGCCTTCAC ' 
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7751 CGAGGCCCAT GCCGTCGGCC GCGGCGGCGA ACGGTTTGCA ACGGCCGTCC 

7801 TGCGCGAGCG ACTTCTGGTG GGCGAAGGCG TGGAAGGTGT GCGGCGTCGA 

7851 CATGACGGTG CCGCCGCCGG CGAGGGCGAG GCCGCACTCC CCGCGGCGCA 

7901 GCGCCTGGCA GGCCAGGTGG AGGGCGACCA GGGAGGACGA GCAGGCCGTG 

7951 TCCACGCTGA TGGCGGGGCC CTCQAGGCCG AGGGCGTAGG CGATGCGGCC 

8001 GGAGACGAGG CTGCCGGACG TGCC6CCGCC CAGATAGGGC AGCAGCTCGT 

8051 CGGGCGCGGT CTCGAGCCGT OTCGCGTAGT CGTGCCCGGT GGCGCCGACG 

8101 TAGACGCCGG T6AGGGTGGA GCGCAGGGTG TGCGGGGCGA TGTGGCCGCG • 

8151 TTCGACGGTC TCCCACGC?GA GGTGGAGCAT GAGGCGCTGG AGGGGTTCGG 

8201 TGGCCACGGC CTCGGTGTCG CTGATGTCGA AGAAGCCCGC GTCGAAGCCG 

8251 GCCGCGTCGT CCAGGAACCC GCCGAGCTCC GCGTACGGGC GTTCCTCGGG 

8301 GAGTTCCCAG GCGCGGTCGT C6GGGAAGCC GGTGACGGCG TCGCGGCCCT 

8351 CGGACACCAG ATCCCACAGG TCGTCCGGGG TGCGGGTCTT GCCGGGCAGC 

8401 CGGCAGGCCA TGGAGACGAC GGCGATCGGC TCGTGCTGTG CGGCCTTCAG • 

8451 TTCGCGCAGC TGCTGCTGGG CCTGGTGGAG CTCGGCCGTC GTCCACTTGA 

8501 GGTATTCGAC GAGCTTCTCT TCGTTCGCCA CGGGAATGGT CAGCCTTCCT 

8551 GTTCTCGCGC GTGAAGCCTC AGGTGGGACG AGGTCGGGCA AGGTGGGCAG 

8601 GCAGGAGCCG CGCGCTGTQG GTGCCAGGGT CGCCGCGGCT GCTTAAGCGG 

8651 GTCTAACTCC CGCCTTGCCG CCGGGCATCG CCTCGCACGA GCGGGCC^C 

8701 AGCAGGAGGT CGGCGGCGAT CTCGTCGGGT GCGCCGGCGT GCAGATCGTG 

8751 GTCGGAGCCC GGGTACCAGC GCACGCTCAC CTGCTCCAGG GCCGCCTCGG 

8801 CGGCGGCCAC . CCAGGCCCGT ACCTGGTCGG ACAGTTGGGG GATGGCGGGG 

8851 ATGAGGGGCA GCAGCCGCAC CGGCACGGTG ACCTTGGGAT ACCAGTCGGC 

8901 CGGTGCCTCC GGTTGCAGGC CGGCGACGAT CGACATGACC TGTGTCGAGG 

8951 TCAGGCGGGG <3ATGAGCAGG CCGTCCGGCC ■ CGACGCGGTA GTCCGCCAGG 

9001 GGTGCCTCGA TGGACGTGGG CGACCAGTCG <3GATGGGTGG -CCCGCAGGTA 
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9051 GGCGCGCATG TCGGCGGCGC TGGTGGTGCC CTGCTGGGCG CGCCGGACCA 

9101 CGTCGGCGGT GCGCTCCCAG AAG6CGCGCA TCACCGGTCC GTCGAACTCG 

9151 TACCAGCCGC CGTCGATCAG GGCGAGACCG GCCACCAGGT CCGGGTGCTC 

9201 GGCCGCCAGG CGCAGCGCGA GGTGCGCGCC CCAGGAGTGC CCGGCCACCA 

9251 GTGCGCCGGA CAGGTCGAGG GCGGTGACGG CCGCCACCAG GTCGGTGACG 

9301 ACCGTCGCGT TGTCGTACCC GTCGGGCGGG GTGTCCGAGT CGCCGTg3CC 

9351 GCGGTGGTCG ACGGCGTAGG CCGGGTGTCC GGCGGCGGCG AGACGGGCGG 

9401 CGACCTCGTC CCACATCCGG GCGTTCGACA GCATGCCGTG CAGCAGCAGG 

9451 AACGGACGGC CCGGGGCTCC CG6CCCGTCC -GCGGGCCGGT ACCTGACATT 

9501 GAGGGAGACG GTCTGCGACA CGGGGATGCG GAGGTTCTTC ACAGGCGGGC 

9551 CCTTGTGATC CCTTGTGCTG GGGGAGGAAA GCGGGGGCGG CACGCTCAGG 

9601 GGCGCTGCGC GGTCGCGAAG ATGTATCCGA GCTCGGGCAT CTTGCCGAGG 

9651 GCCGCCTGGT TGTGCAGGAA - CAGCTCGTAT CCCTCTACGC CGATGATGTC 

9701 . GACGTACTCG TCCCGGTGGG GGCGGATCCA -CTCGACGTAA CCGTCGTAGG 

9751 TCTTGGCGGT CTCGCGGGTG ATGTCGGTCA GTTCGAGGAC GGTCCAGGCG 

9801 GC?GGCGCGGA AGATGTCGGG GTAGTCCCCG ATGTCGGTGA GCGCGGCGTA 

9851 GATCGTGGTG TCGCTGACGG TGGCGGTCCG GGGCCGGCTG GGATCGGGGT 

9901 TGAGGTAGAC CATGTCGGCG ATCGGCATCC GCGCGCCGGG CTTCACGACG 

9951 CGGTGGGCCT CGGTGAGCAC CTGCTGCTTG TCCGGCATGT GCAGCATGGA 

10001 CTCCAGGGCC CAGCAGTGGT CGAAGGAGCC GTCGTCGAAC GGCAGGTTCA 

10051 TGGCGTCGAC CTGCTCGAAG CGGACCCGGT CGGCGAGGCC GGCCTCGCGC 

10101 GCCCGGCGGT TGCCGCGCTC GACCTGGCGG GCGGTGACGG AGATGCCGAC 

10151 CACCTCGACG TCGCGGGCGC <3GGCCAGCTG CATGGCCGGG GTGCCGTTGC 

10201 CGCAGCCGAT GTCGAGGACG CGGTOGCCGG GGGCCGGGTC GAGGCGGGGG 

10251 ATCATCTCGT -CGGTCATCTG GACCATGGCC TCGTCGAACG TGGCCTGCTG 

103 01 CTCGCCGCCG TCGAACCAGT AGC-CGTAGTG CAGATTGCCG- TCTCCOAGCT 
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•10351 GAGTCATCAG GTCGAAGACC TTGTGGTCGT AGTAGTGGCC GATGTCGCTG 

10401 GGCTCGGGGG CGACGGTCTT GTTCACCGTC GGGGGCTTCT TGGTCGTCGC 

10451 GTTCTTCGTC ACGGCTTCAG CGTCACCGTG CGGCGGCAGG CGCCACAACC 

10501 CCACCCCCGC CCCTCAAAAG CCCCTATGGG CCCTCCTCGA CCGCCCCTAG 

10551 GGAGCTGCTC TTGACGCGTT CCATACGGAA CGGGTGGTAC CCCTCCGAAA 

10601 AAAATGAGAG TACGCTCCCA CTAGATATTG AGCTCTCTTT AGGAGGTC?GA 

10651 CTCCCATGTC .TGCTGATCTG . GGTGCGCGGC GGTG6TGGGC CGTCGGTGCT 

10701 CTCGTACTCG CCTCGATGGT- CGTGGGCTTC GATGTGACGA TCCTGAGCCT 

10751 GGCGTTGCCC GCCATGGCCG ACGACCTCGG CGCGAACAAC GTCGAGCTGC 

10801 AGTGGTTCGT GACGTCGTAC ACGCTGGTGT TCGCGGCCGG CATGATCCCG 

10851 6CCGGCATGC TCGGTGACCG GTTCGGACGC AAGAAGGTCC TGCTCACCGC 

10901 CCTGGTGATC TTCGGTATCG CCTCGCTGGC. CTGTGCCTAC GCGACGTCCT 

10951 CCGGCACCTT CATCGGCGCG CGTGCGGTGC TCGGTCTGGG CGCCGCGCTG 

11001 ATCATGCCGA CGACGCTGTC GCTGCTGCCG GTCATGTTCT CCGACGAGGA 

11051 GCGGCCGAAG GCCATCGGAG CGGTGGCCGG TGCGGCGATG CTCGCCTATC 

11101 CGCTCGGCCC GATCCTCGGC GGCTACCTGC TCAACCACTT . CTGGTGGGGC 

11151 TCCGTCTTCC TGATCAACGT GCCGGTGGTG ATCCTCGCCT TCCTCGCGGT 

11201 CTCCGCCT6G CTGCCCGAGT CCAAGGCCAA GGAGGCCAAG CCGTTCGACA 

11251 TCGGCGGCCT GGTGTTCTCC AGCGTCGGTC TCGCCGCGCT GACCTACGGC 

11301 GTGATCCAGG GCGGCGAGAA GGGCTGGACG GACGTCACCA CGCTGGTGCC 

11351 GTGCATCGGC GGTCTGCTCG CCCTC6TGCT GTTCGTGATG TGGGAGAAGC 

11401 GGGTGGCGGA CCCGCTGGTC GACCTCTCGC TGTTCCGCTC GGCCCGGTTC 

11451 ACCTCCGGCA CCATGCTCGG CACCGTCATC AACTTCACGA TGTTCGGCGT 

11501 GCTCTTCACG ATGCCGCAGT ACTACCAOGC GGTCCTGGGC ACCGACGCGA 

1155 1 TGGGCAGCGG CTTCCGGCTG CTGCCGATGG TCGGGGGTCT GCTCGTGGGT 

11601 GTGAC-GGTCG CCAACAAGGT CGCCAAGGCC CTCGGCCCGA AGACCGCGGT 
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11651 CGGCATCGGC TTCGCCCTCC TCGCCGCCGC CCTGTTCTAC GGCGCCACCA 

11701 CGGACGTCAG CAGCGGCACC GGCCTGGCGG CCGCCTGGAC CGCGGCCTAC 

11751 GGACTCGGCC TCGGCATCGC CCTGCCGACC GCCATGGACG CCGCCCTCGG 

11801 CGCGCTCTCC GAGGACTCCG CCGGCGTCGG ATCCGGCGTC AACCAGTCCA 

11851 TCCGTACCCT CGGCGGCAGC TTCGGCGCGG CCATCCTCGG TTCCATCCTC 

11901 AACTCCGGCT ACCGCGGCAA GCTCGACCTC GACGGCGTGC CCGAGCAC^C 

11951 ACACGGCGCG GTCAAGGACT CCGTCTTCGG CX3GCCTCGCG GTGGCCCGGG 

12001 CGATCAAGTC CAACGGACTG GCCGACTCGG TGCGTTCCGC GTACGTCCAC 

12051 GCCCTGGACG TGGTCICTCGT GGTCTCCGGC GGCCTCGGAC TGCTGGGTGT 

12101 GGTGCTGGCG GTGGTGTGGC TGCCCCGCCA TGTTGGTCAG AGCACCGCCA 

12151 AGACAGCAGA ATCTGAGCAT GAAGCCGCAG ACGCAGTCTG ACCAGGGCAA 

12201 AACAGTGCCT GGTCTGAGAG AACGCAAGAA GGCCCGGACG AAGGCCGCGA 

12251 TTCAGCGGGA GGCGGTGCGC TTGTTCAGGG AACAGGGCTA CACCGCCACG 

12301 ACCATCGAGC AGATCGCCGA AGCCGCCGAG OTCGCTCCCA GCACCGTCTT 

12351 CCGCTACTTC GCGACCAAGG AGGACCTGGT CTTCTCGCAC GACTACGATC 

12401 TGCCCTTCGC GATGATGGTC CAGGCCCAGT CACCCGACCT GACGCCGATC 

12451 CAGGCCGAGC GGCAGGCCAT CCGCTCGATG TTGCAGGACA TGAGCGAGCA 

12501 GGAACTGGCC CTGCAGCGCG AGCGGTTCGT CCTGATTCTC TCCGAGCCGG 

12551 AGCTCTGGGG CGCCAGCCTC GGCAACATCG GCCAGACCAT GCAGATCATG 

12601 AGTGAGCAGG TGGCCAAACG GGCCGGGCGC GACCCGCGGG ACCCCGCGGT 

12651 CCGCGCCTAC ACCGGAGCCG TGTTCGGAGT GATGCTCCAG GTCTCGATGG 

12701 ACTGGGCCAA CGATCCGGAC ATGGACTTCG CGACCACGCT GGACGAGGCA 

12751 CTCCACTACC TGGAAGACCT GCGGCCCTGA CCGAAGGGGC GGGCGCACAC 

12801 CACAGAGCCC GCCGCGGCCA GACGTGGTAC <5AGGCGCCAT CGGCCGTCGC 

12851 GTACGACCCC CGCGCCCCGG ATTCCCCGGC GGGGCGCJGGG GTCAAGGGAA 

12901 ■ AAGAGACGAC CGCACGCXSGC, CACTGTTCCC CCGGCTGCCG CGTCCGGTCC 
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12951 AACCTGGCGT GCTCCGGCTT CCCTCGACGG AGCACGCCAG GGGTCTGTCC 

13001 GGCCCTCTCC CGGCGGCTCC CGTCAGACGC CCGGCCCCGC CGTCAGCGCC 

* X3051 TCGGTCACGA CGGCCGCCAG CTGCTCCTGA CAGCCGTCGA GGTAGAAGTG 

13101 CCCGCCGGGC AGCACCCGCA GATCGAACGC CGCGCCGGTC CGCTCGCGCC • 

13151 ACGTGGCGGC CTGCTCCGGC GACGTCCGCT CGTCGGCGTC CCCGATCAGC 

13201 GCCGTGATCG GGCAGTCGAG CCGGCCGGGT CCCGGCGCCT CGTAGGTG'gC 

13251 CACGGCCCGG TAGTCGGCCC GCAGCGCGGG CAGGACGAGC TCCTGCAGCT 

13301 CGGGACTGCG GAAGAACCGC TCGTCGGTGC CGCCCATCGC CCGCAGATGG 

13351 GCCAGGATGT CCGCGTCCCC GAACGGCCCC GAACGCCCCG CGGGACGGTA 

13401 GGGCCGCGCG AGCCCCCCGG AAACGAACAG GTGCACGGGA aggccgggcc 

13451 cggccggtcc ccgcagccgc cgcgccacct cgaacgccac gatggcgccc 

13501 aggctgtgcc cgaacagcgc gaatggcttc ccgtcgcacg gcaggtgggg 

13551 cacgacgccg tcggcgagct cggccaccga cgccaggcac ggctccgcat 

13601 gacggtcctg ccgccccgga tactgcacgg cgagcacctc gacgccgggc 

13651 gcgagcagcc cggagagccc gaagtagtaa ctcgccgaac cgcccgcgaa 

13701 CGGAAAGCAG ACCAGCCGCA CCGGCGCCTC TGCCGCAGCG TGGTACCGCC 

13 751 GCAACCACAC CCCGTTTCCG GTGGCTGCAC CGAACTCGTC ACCGATCTGT 

13801 GGTGCCCGCG CCGCCGTGCC CCTGTCCATC GTTCTCCCTC TCCTCGCGTC 

13851 GCTCCGCGGG CGCTGTCCTG CCCCGCCCCG AAAGCCCGAT GCCGGGCAAG 

13901 CCCCGATGCT GGCCAAACCC CGATGCCGGC CAAGCCCCGA TGCTGGCGGC • 

13951 GGCCCATAGC GCCCGGCTAA AGCCGCAGGC GGCTAGCCGG GGTTTGGTTC 

14001 GCCTTTAGAC AGCCCACCCA GGATGAGCCC GGTAGTCGAA GCGATCTCCG 

14051 ATTTCGGACG GGGAGCGCCG TTGATGTTTT GTGGCAGCCA GTTGTTCAGC ' 

14101 GCCCGACCGC AGCTGACGTG ATGGCCGCAT. CCGCGTCAGC GTCCCGCTCG 

14151 GGACCGAGCG CAGGACCGGA <:CCGATCGCC GTGGTCGGQA TGGCCTGCCG. 

14201 CGTGCCGGGA GCACCTGACC CCGAGGCGTT CTGGCGGCTG CTCAGCGAGG 
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14251 GGCGCAGCGC GGTGAGCACC GCACCGCCCG AGCGGCGGCG AGCCGACTCC 

14301 GGCCTCCACG GGCCGGGCGG CTACCTGGAC CGGATCGACG GCTTCGACGC 

14351 GGACTTCTTC CACATCAGCC CGCGCGAGGC CGTGGCGATG GACCCCCAGC 

14401 AGCGGCTGCT CCTCGAACTG AGCTGGGAGG CCCTCGAAGA CGCGGGCATC 

14451 CGGCCGCCCA.CCCTGGCGCG CAGCCGCACC GGCGTCTTCG TCGGCGCGTT 

14501 CTGGGACGAC TACACCGACG TCCTGAACCT GCGGGCGCCG GGCGCCGfCA * 

14551 CCCGCCACAC CATGACCGGC GTGCACCGCA GCATTCTGGC CAACCGCATC 

14601 TCGTACGCGT ACCACCTGGC CGGTCCGAGC CTCACCGTCG ACACCGCACA 

14651 GTCCTCCTCG CTCGTCGCCG TCCACCTGGC CTGCGAGAGC ATCCGCAGCG 

14701 GCGACTCCGA CATCGCCTTC GCGGGCGGCG TCAACCTCAT CTGCTCGCCG 

14751 CGCACCACCG AGCTGGCCGG GGCCCGCTTC GGCGGTCTCT CGGCCGCAGG 

14801 CCGCTGCCAC ACCTTCGACG CCCGCGCCGA CGGTTTCGTA CGCGGCGAGG 

14851 GCGGCGGCCT CGTGGTGCTC AAGCCCCTCG CGGCGGCACG GCGCGACGGC • 

14901 GACACGGTGT ACTGCGTGAT CCGGGGGAGC GCCGTCAACA GCGACGGTAC 

14951 GACCGACGGA ATCACCCTGC CCAGCGGGCA GGCGCAGCAG GACGTGGTGC 

15001 GCCTCGCCTG CCGAC6GGCG CGGATCACGC CGGACCAGGT GCAGTACGTC • 

15051 GAACTGCACG GCACCGGCAC GCCCGTCGGG GACCCGATCG AGGCCGCCGC- 

15101 GCTCGGCGCC GCCCTCGGGC AGGACGCCGC CCGCGCCGTG CCGCTGGCCG 

15151 TCGGCTCCGC CAAGACGAAC GTCGGCCACC TCGAAGCCGC CGCCGGAATC 

15201 GTCGGACTGC TCAAGACCGC CCTGAGCATC CACCACCGGC GGCTGGCGCC 

15251 GAGCCTGAAC TTCAGCACCC CCAATCGGGC . CATCCCGCTC GCCGACCTCG - 

15301 GCCTGACCGT CCAGCAGGAC CTGGCCGACT GGCCGCGCCC CGAACAGCCC 

15351 CTGATCGCCG GGGTGTCGTC CTTCGGCATG GGCGGCACGA ACGGTCACGT 

15401 TGTCGTGGGG GCGGCGCCCG ATTGGGTGGC GGTACCTGAG CCGGTGGGGG 

15451 TGCCTGAGC-G GGTGGAAGTC CCTCAGCCGG TGGTGGTTTC TGAGCCGGTG 

15501 GTGGTGCCGA CGCCATGGCC CGTCAGCGCT CACAGCGCTT CCGCGCTGCG 
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15551 CGCGCAGGCC GGTCGCCTGC GGACGCACCT CGCCGCCCAC CGCCCCACCC 

15601 CCGACGCCGC GCGGGTCGGC CACGCGCTCG CCACCACCCG TGCGCCCCTC 

15651 GCCCACCGCG CGGTCCTGCT CGGCGGCGAC ACCGCCGAAC TGCTGGGCTC 

15701 CCTGGACGCG CTGGCCGAGG GCGCGGAGAC CGCGTCCATC GTGCGCGGCG 

15751 AGGCGTACAC CGAGGGCAGG ACGGCCTTCC TCTTCAGTGG GGAGGGAGCG 

15801 CAACGCCTCG GCATGGGGCG GGAGTTGTAT GCCGTGTTCC CCGTCTTdsC 

15851 CGACGCTCTC GACGAGGCGT TCGCCGCCCT GGACGTACAT CTGGACCGCC 

15901 CACTGCGCGA GATCGTCTTG GGCGAGACCG ACTCGGGTGG GAACGTCTCG 

15951 GGTGAGAATG TCATCGGCGA GGGTGCCGAC .CATCAGGCAQ TCCTCGACCA 

16001 GACCGCCTAC ACCCAGCCCG CGCTCTTCGC GATC6AGACG AGCCTGTACC 

16051 GGCTGGCAGC CTCCTTCGGC CTGAAGCCGG ACTACGTCCT CGGCCACTCG 

16101 GTCGGCGAGA TCGCCGCCGC GGACGTCGCC GGTGTCCTCT CGTTGCCGGA 

16151 CGCGAGCGCT CTGGTGGCCA CGCGGGGACG GCTCATGCAG GCGGTTCGCG 

16201 CGCCCGGCGC GATGGCCGCG TGGCAGGCCA CGGCGGACGA GGCGGCCGAA 

16251 CA6CTCGCCG GGCACGAGCG GCACGTCACC GTGGCCGCCG TCAACGGCCC 

16301 CGACTCCGTG GTCGTCTCCG GCGACCGCGC CACCGTCGAC GAACTGACCG 

16351 CCGCCTGGCG GGGACGCGGC CGCAAGGCCC ACCACCTGAA GGTCAGCCAC 

16401 GCCTTCCACT CCCCGCACAT GGACCCCATC CTCGACGAGC TGCGCGCGGT 

16451 CGCCGCCGGC CTGACCTTCC ACGAGCCGGT CATTCCCGTC GTCTCCAACG . 

16501- • TCACCGGTGA ACTGGTGACC GCGACCGCGA CCGGGAGCGG CGCCGGGCAG ' 

16551 GCGGACCCCG AGTACTGGGC GCGGCATGCG CGCGAGCCCG TGCGGTTCCT 

16601 GTCCGGGGTG CGGGGGCTGT GCGAGCGCGG GGTGACCACG TTCGTCGAGC 

16651 TCGGCCCGGA CGCACCGCTG TCCGCGATGG CCCGCGACTG CT'TCCCCGCC 

16701 CCCGCGGACC GGAGGCGTCC GCGCGGCGCC GCCATCGCCA CATGCCGCCG 

16751 CGGGCGCGAC GAGGTGGCCA CGTrCCTGAG GTCGCTGGCC CAGGCGTACG 

16801 TCCGGGGCGC CGATGTGGAC .TTCACCCGGG CCTACGGCGC CACCGCCACG . 
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16851 CGCCGCTTCC CCCTCCCCAC GTATCCCTTC CAGCGCGAGC GCCATTGGCC 
16901 TGCCGCTGCC GGGGTGGGGC AGCAGCCGGA GACCCCGGAA CTTCCGGAAT 
16951 CCTCGGAGTC CTCGGAGCAG GCAGGGCATG AGCGGGAGGA GGGGGCGCGC 
17001 GCGTGGGGCG GGCCTGAAGG GCGGCTTGCC GGGCTCTCCG TGAACGACCA 
17051 GGAGCGGGTC CTCCTCGGCC TGGTCACCAA GCACGTGGCC GTCGTGCTCG 
17101 GGGACX^CCTC GGGCACGGTA CAAGCCGCCC GCACCTTCAA GCAGTTGG'&C 
17151 TTCGACTCGA TGGCCGCCGC CGAGCTGAGC GAACGGCTCG GCACGGAGAC 

17201 GGGCCTGCCG TTGCCCGCCA CCCTCACCTT CGACTACCCG ACCCCTCTGG 

17251 CCGTCGCCGC GCACCTGCGC GCGGAGCTCA CCGGTACGCC CGCCCCGGCC ' 

17301 GGCTCCGCGC CCGCCACGGG CGCCCTCGGC GCGGGTGACC TCGGCACGGA 

17351 CGAGGACCCG GTCGCCATCG TGGCCATGAG CTGCCGCTAT CCCGGCGGCG 

17401 CAGGCACGCC CGAGGACCTG TGGCGGCTGG TCGCGGACGG CGCCGACGCG 

17451 ATCGGAGACT TCCCCACCGA CCGCGGCTGG GACCTGGCGC GGCTGTTCCA 

17501 CCCCGACCCC GACCGGTCGG GCACCAGCTG CACGCGGCAG <3GCGGATTCC 

17551 TGTACGACGC CGCCGACTTC GACGCCGAGT TGTTCGACAT CAGCCCGCGC 

17601 GAGGCCCTGG CCGTCGACCC OCAGCAGCGG CTGCTCCTCG AGTGCGCCTG 

17651 GGAGGCCTTC GAACGGGCGG GCCTGGACCC GCGGGCGCTC AAGGGCAGCC 

17701 CCACCGGCGT GTTCGTCGGC ATGACGGGGC AGGACTACGG CCCCCGTCTG 

17751 CACGAGCCGT CCCAGGCCAC CGACGGCTAT CTGCTGACCG GCAGCACGCC 

17801. GAGCGTGGCC TCGGGCCGCC TGTCGTTCAG CTTCGGCCTT GAGGGGCCCG 

17851 CCCTGACGGT GGACACGGCC TGCTCGTCGT CGCTGGTCAC GCTCCATCTC 

17901 GCGGCGCAGG CGCTGCGGCG CGGCGAGTGC GACCTGGCGC TCGCCGGCGG 

17951 CGCCACCGTC CTGGCCACGC CGGGCATGTT CACCGAGTTC TCGCGGCAGC 

18001 GGGGCCTGGC CCCCGACOGC C<5CTGCAAGC CGTTGGCGGC GGGCGCCGAC 

18051 -GGCACGGGCT GGGCCGAGGG CGTGGGCCTG GTCCTCCTCG AAAGGCTCTC 

18101 CGAGGCCCGG CGCAAGGGGC ACGCCGTCCT CGCGGTGATC CGGGGTTCGG 
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18151 CGAT(2AACCA GGACGGCGCG AGCAACGGCC TGACCGCGCC CAACGGCCCC 

18201 TCGCAGCAAC GCGTCATCCG TGCCGCGCTC GCGGCCGCCC GGCTCACCGC 

18251 GGACGAGGTC GACGTAGTGG AGGCGCACGG CACCGGCACC ACGCTCGGCG 

18301 ACCCGATCGA GGCGCAGGCC CTGCTCGCCH CGTACGGCCA AGGGCGTTCG 

18351 GCGGAGGGGC CGTTGTGGCT CGGGTCGGTG AAGTCGAACA TCGGTCACAC * 

18401 GCAGGCCGCC GCGGGTGTCG CGGGCGTCAT CAAGATGGTG ATGGCGATGC 

18451 GCCACGACCT GCTCCCCGCC ACCCTGC^CG TCGACGAGCC GAGTGGCCAC 

18501 GTGGACTGGT CCACCGGCGC GGTGCGACTG CTCACCGAGC CGGTCGTCT6 

18551 GCCGCGCGGC GAACGTCCGC GCCGCGCCGC GGTGTCGTCC TTCGGCATCT 

18601 CCGGCACGAA CGCGCACCTG GTGCTCGAAG AGGCGGGGCA GGACGAGTAC 

18651 GTTGCGGGAG CCGCCGACGA CGCCGGGCCG GTGGACGGTG CTGTGCTGCC 

18701 GTGGGTGGTT TCCGGACGGA CCGGAGCGGC GCTGCGCGAA CAGGCCCGCC 

18751 GTTTGCGTGA GTTGGTGACC GGCGGCTCGG CCGATGTCTC TGTGTCCGGG 

18801 GTGGGCCGGT CGCTGGTCAC CACGCGGGCG GTGTTCGAGC ACC6GGCCGT 

18851 GGTCGTGGGC CGCGACCGGG ACACGCTGAT CGGCGGCCTC. GAGGCCCTTG 

18901 CGGCGGGTGA CGCGTCGCCG GACGTCGTGT GCGGGGTCGC GGGCGATGTC 

18951 GGCCCCGGCC CGGT6CTGGT GTTCCCCGGG CAGGGCTCGC AGTGGGTGGG 

19001 CATGGGAGCC CAACTCCTTG GCGAGTCCGC GGTGTTCGCG GCGCGGATCG - 

19051 ACGCGTGCGA GCAGGCGCTG TCCCCGTACG TCGACTGGTC ACTGACAGAG 

19101 GTCCTGCGCG GGGACGGGCG CGAACTGTCG CGCGTCGACG TCGTCCAGCC 

19151 CGTGCTGTGG GCGGTGATGG TCTCGCTCGC CGCCGTCTGG GCGGACCACG 

19201 GCGTCACCCC GGCCGCCGTC GTCGGGCACT CCCAGGGAGA GATCGCCGCT 

19251 GTGGTCGTCG CCGGCGCGCT CACCCTGGAG GACGGCGCCA AGATCGTGGC 

19301 CCTGCGCAGC CGGGCGCTGC GTCAGCTCTC GGGCGGGGGC GCGATGGCCT 

19351 CCCTCGGGGT GGGCCAX3GAA CAGGCAGCCG AACTCGTGGA GGGCCACCCC 

19401 <3GAGTGGGCA TCGCCGCCGT CAACGGCCCG TCATCGACCG TCATTTCAGG 
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19451 CCCGCCCGAG CAAGTCGCCG CCGTCGTCGC CGACGCCGAG GCGCGCGAGC 

19501 TGAGAGGCCG CGTCATTGAC GTGGACTACG CCTCGCACAG CCCCCAGGTC 

19551 GACGCCATCA CCGACGAACT CACCCACACC CTGTCCGGCG TCCGCCCCAC 

19601 CACGGCCCCG GTGGCGTTCT ACTCGGCCGT GACCGGAACC CGCATCGACA 

19651 CGGCGGGCCT CGACACCGAC TACTGGGTCA CCAACCTGCG CCGCCCGGTC 

19701 CGGTTCGCCG ACGCCGTCAC CX3CGCTCCTC GCCGACGGCC ACCGGGTCTT 

19751 CATCGAGGCC AGCAGCCACC CCGTCCTCAC CCTCGGCCTC CAGGAGACCT 

19801 TCGAGGAGGC CGGGGTCGAC GCCGTCACCG TCCCCACCCT GCGGCGCGAG 

19851 GACGGCGGCC GGGCACGCCT GGCCCGCTCG CTGGCACAGG CCTTCGGCGC 

19901 CGGGTGCGCG GTGAGGTGGG AGAACTGGTT TCCGGCCACC GGTACGTCCA 

19951 CCGTGGAGCT GCCGACGTAC GCCTTCCAGC GTCGCCGTTA CTGGCTGGAG 

20001 GCCCCCACGG GCACCCAGGA CGCGGCGGGC CTGGGCCTCG CCGCTGCGGG 

20051 GCACCCGCTC CTCGGGGCGG CCACCGAGAT CGCGGACGGC GACATCCGCC 

20101 TGCTCACCGG CCGTATCAGC AGGCACAGCC ACCCCTGGCT CGCTCAGCAC 

20151 ACCCTCTTCG GTGCCGCGGT CGTGCCCGCC TCCGTCCTCG CGGAATGGGC 

20201 GCTGCGCGCC GCCGACGAGG CCGGCTGCCC GCGTGTCGAC GACCTCACGC 

20251 TGCGCACCCC GCTGGTGCTG CCCGAGACCG CGGGCGTGCA GGTGCAGATC 

20301 GTGGTCGGCC CGGCCGACGC GCGGGACGGG CACCGCGACT TCCACGTCTA 

20351 CGCCCGCCCC GACGGCAAGG ACGCCTCTGA -GGGCGAGGGC ATCGCCGAGG 

20401 GCGAGGGTGC CTCTGAGGGC GAGGGTGCCT CCGGCGGCAC CGATGCGCCG 

20451 TGGACCTGCC ATGCCGACGG CCGACTGGTC GCCGAGCCCA CCGGCACGGC 

20501 CTCGGAGGAC TCCCCGGACA CGGTGTGGCC GCCGCCCGGC GCCGAACCCG 

20551 TCGAGCTGGG CGACTTCTAC GAGCGGGCCG CCGCCACCGG AGTCGGCTAT 

20601 GGACCGGTCT TCACGGGGCT GCGCGCCCTG TGGCGGGGGG ACGGCGAGCT 

20651 -GTrOSCCGAG GCGGTGCTGC CGCAAGAAGC CCCGGAAACC GCCGGGTTCG 

20701 GCATGCACCC GGCGCTCCTC OACGCCGCAC TGCACCCCGC ACTCCTCGGC 
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20751 GAGCGGCCGG CCGAGGAGGA CAAGGTGTGG CTGCCGTTCA CGCTGACCGG 

20801 AGTGACCCTG TGGGCCACCG GTGCCACCTC TGTACGCGTC CGTCTCACCC 

20851 CGCTGGACGA CGACCCCGAC GCGTCGGCGG ACGGGCGGGC CTGGCGGGTC 

20901 GGCGTGAGCG ACCCGACCGG CGCGGAGGTG CTGACCTGCG AGGCCCTGGT 

20951 CGCGGTGGCG GCGGGCCGCC GCGAGCTGCG GGCCGCGGGG GAGCGGGTGT 

21001 CCGATCTGTA CGCGGTGGAG TGGGTGCCGG TGCCGGGCCC GGGGCCG(!tG 

21051 GGTGAGGGTG CTGACTTCTC GGGCTGGGCC GGTCTGGGGG AGTGCGGGGA 

21101 GCGTTGGGAG TGCGTGGGGC GCGTGGAGCG CTGGTACGAG gacctggacg 

21151 CTCTCGGCGC GGCTGTCGAG GGTGGGGCT7 CGGTGCCCTC TGTCGTTCTC 

21201 GCCACCGCGG CTGCCGCCCC TGGTGGAGCG GGCGACGGAG CCGCCGATGC 

21251 GCTGAGCGCG GTGCGGTGGA CCGGCGCGCT CCTCGATCAG TGGCTCGCCG 

21301 ACGCGCGGTT CGCCGACGCC CGGCTGGTGG TGATCACGTC CGGCGCGGTC 

21351 GCCACGGGTG ACGATTTCCT TCCCGACCCG GCCGCCGCGG CGGTACGAGG 

21401 ACTGGTCGAG CAGGCGCAGG TCAGGCACCC CGGCCGCATC CTCCTCGTCG 

21451 ACACGGAAGC CGGGGCCGGG CTCGGGGTCG GCGCCGGAGT GGATGACGCG 

21501 CTCCTGGAAC AGGCCGTGGC CATGGCTCTC GGCGCCGACG AACCGCAACT 

21551 CGCCCTGCGC GCGGGGCGGG TCCTGGCGCC CCGCCTCACC GCACCCCAGG 

21^01 ATGCGGCCGT CACCGAAGCG GCGCGACCGC TOGACCCGGA CGGCACCGTA 

21651 CTCATCACAG GGCCGGCCGG TGCTCCGGTG GCCGACCTCG CCGAACACCT 

21701 CGTACGCACC GGGCAGTGCA GGCATCTGCT GCTCCTGCCT GGAGACGGTG 

21751. AACTGGAGGA AATGGCCGAG GAGTTGCGGG GCCTOGGCGC CACCGTGGAC 

21801 CTGAGTACCG CCGACCCGGC GGACCCGACC GCCCTCGCCG AAGTGGTCGC 

21851 CGCCGTCGAG GGGGACCATC CTCTTACGGG <3GTCATCCAC GCCACCGGAG 

21901 TCGTGGAGGC OTTCGATCCC GGCGACTCGG CGAGCGACTT GATGATCGAC 

21951 TCGGGGAGCG ATTCGTTCGC CGAGGCATGG TCGTCGAGGG CGGGCGTCAC 

22001 CGCGGCACTG CACACCGCGA CCGCCCACCT TCCCCTGGAC CTGTTCGCCG 
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22051 TCCTGTCCCC GGCGGGCGCG GACCTGGGCA TTGCCCGGTC GGCGGCCGCC 

22101 GCGGGCGCCG ACGCCTTCAG CGCGGCACTC GCCCTGCGCC GGCACACGAC 

22151 CGTCACGACG GACACGACAG CCCCGCCGCG CACGACAGCC CCGCCGCGAA 

22201 CGACAGCCTC GCCGCGCACG ACAGCCCTGT- CGTCGTCGCG CACGACGGGC 

22251 GTGGCCCTCG CCTACGGGCC GCCCACCGCG CCGAGGCCCG GCATCAAGGG 

22301 GACGGCGCCC GGTCGGATCC CCGTGCTGCT CGACGCCGCT CGCGCTCAtc 

22351 GGGGCGGTTC GCCCCTGCTC GGGGCCCGCT TGGCCGCGCG TGCCCTGGCC 

22401 GCCGAGTCCG CCGCCGAGGG CGTCGCCGGC CTGCCCGCGC CGCTGCGCGC 

22451 GCTGGCAGTG GCCGCAGCCG CGGCCGGAGC ACCGACCCGG CGCACCGCCG . 

22501 CCGACCGCAA GCCCCCCGCG GACTGGCCGG CCCGACTGGC CCCCCTGTCC 

22551 GCCCCCGAAC AACTCCGTCT GCTCATCGAC GCCGTACGCA CCCACGCCGC 

22601 CGCGGTCCTC GGCCGCACCG ACCCGGAAGC GCTGCGCGGG GACGCCACCT 

22651 TCAAGCAGCT . CGGCCTTGAC TCGCTGACCG CCGTGGAGCT GCGCAACCGG 

22701 CTCGTGGAGG AGACCGGTCT GCGGCTGCCC ACCGCCCTCG TCTTTCGCTA 

22751 eCCGACCCCC GCGGCGATCG CCGCGCACCT CCGCGAGCGG CTGACCAGCC 

22801 CGAGCGAGAC GACCGCCACA CAGAGGTCCG GAGGGCAGAC GCCCGCAGCG 

22851 GGGCAGGCGT CGTCCGCGCT CGCCCCCGGC GGA1CGGCCG CCGGACCGCC 

22901 CGCCGCAGAC ACCGTGCTGA GCGACCTGAC CCGCATGGAG AACACCCTCT 

22951 CCGTGCTCGC CGCCCAGCTG CCCCAGACCG AGACGGGTGA GATCACCACC 

23001 CGGCTCGAAG CGCTCCTCAC GCGCTGGAAG ACCACGAAGG CCACGGCGAA 

23051 CGACAGCGGC GACGGCAACG GCGGCGATGA CGACGCCGCC GAACGCCTCA 

23101 AGGCCGCGTC CGCCGACCAG ATCTTCGACT TCATfCGACAA CGAGCTTGGT 

23151 GTCGGGCACG GCACCTCGCG GGTGACCCCC ACTCCGAAGG CCGGGTGACC 

23201 GCACATGGCG AGTGAAGAGC AACTGGTCGA ATATCTGCGC AGGGTGACCA 

23251 GCGAGCTCCA TGACACGCGT •OGGC-GCCTGG TGCAGGAGGA GGACCGCAGG 

23301 CAGGAAGCGG TGGCCCTGGT CGGCATGGCC TGCCGCTTCC CGGGCGGCGT 
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.23351 GGCCTCACCG GAGGACCTCT GGGACCTGGT CGCCGCGGGC AAGGACGCCA 

23401 TCGAGGACTT TCCCACCGAC CGGGGCTGGG ACCTGGAGGC GCTCTACGAC • 

■23451 CCGGACCCGG CCGCGTACGG GACCAGCTAT GTCCGCCACG GCGGGTTCGT 

23501 GGACGACGCG GGCTCCTTCG ACGCCGACTT CTTCGGCATC AGCCCGCGAG 

23551 AAGCCCTGGC GATGGACCCG CAGCAGGGGC TGATGCTGGA GACGTCCTGG 

23601 GAGCTGTTCG AGCGCGCCGG CATCGAACCC GTCTCCCTCA AGGGCAGCftG 

23651 TACGGGCGTC TACGCCGGGG TGTCCAGCGA GGACTACATG TCCCAACTGC 

23701 CCCGCATCCC CGAGGGGTTC (^GGGGCACG CCACCACCGG CAGCCTCACC 

23751 AGCGTCATCT CGGGCCGGGT CGCGTACAAC TACGGCCTCG AAGGCCCGGC 

23801 CGTCACCGTC GACACAGCCT GTTCCGCCTC GCTCGTCGCC ATCCACCTGG 

23851 CGAGCCAGGC GCTGCGCCAG CGTGAGTGCG ACCTCGCCCT CGCGGGCGGT 

23901 GTGCTCGTAC TGTCCAGCCC GCTCATGTTC ACCGAGTTCT GCCGCCAGCG 

23951 GGGCCTTGCT CCCGACGGCC .GCTGCAAGCC GTTCGCCGCC GCGGCGGACG 

24001 GCACCGGCTT CTCGGAGGGC ATCGGTCTGC. TCCTCCTGGA GCGCCTGTCC 

24051 GACGCGCGCC GCAACGGCCA CAAGGTGCTC • GCGGTGATCC GCGGCTCCGC ■ 

24101 CGTCAACCAG GACGGCGCGA GCAACGGCCT GACCGCCCCC AACGACGCCG 

24151 CGCAGGAACA GGTCATCCGC GCCGCCCTCG ACAACGCCCG CCTCACCCCG 

24201 TCCGAGGTGG ACGCCGTCGA GGCGCACGGC ACCGGCACCA AACTGGGCGA 

24251 CCCCATCGAG GCCGGAGCGC TGCTCGCCAC CTACGGGCAA CACCGCGCCC . 

24301 GGGCCCTCCT CCTCGGCTCC CTCAAGTCCA ACATCGGCCA CACCCACGCC 

24351 ACCGCGGGCG TCGCCGGTGT CATCAAGACC GTCATGGCGA TCCGCAACGG 

244 01 TCTGCTCCCC GCCACCCTCC ACGTCGAGGA ACTGAGCCCG CACGTCGACT 

24451 GGGACGCGGG CGCGGTCGAG GTCGTCACGG AGCCCACCCC GTGGCCCGAG 

24501 ACCGGGCACC CCCGGCGCGC GGGCGTCTCG GGGTTCGGGA TCTCCGGGAC , " 

245S1 GAATGCGCAC TTGATCCTGG AGGAGGCGCC GCOGGAGGAG GATGTGCGCG 

24601 CCCCGGTGGT TGTGGAGTCG GGCGGGGTCG TTCCGTGGGT GGTGTCCGGG 
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24651 CGGACGCCGG AGGCGCTGCG TGAACAGGCC CGGCGACTCG GCGAGTTCGT 

24701 GGCAGGCGAC ACGGACGCAC TGCCGAACGA GGTCGGCTGG TCCTTGGCCA 

24751 CGACCCGGTC GGTGTTCGAG CACCGGGCTG TGGTCGTGGG GCGTGACCGG 

24801 GATGCGTTGA CGGCTGGCCT GGGGGCGTTG GCTGCGGGTG AGGCTTCGGC 

24851 GGGTGTGGTG GCCGGGGTGG ' CCGGTGATGT GGGTCCTGGG CCGGTGTTGG 

24901 TGTTTCCGGG GCAGGGGGCG CAGTGGGTGG GCATGGGTGC CCAGCTG-^TG 

24951 GACGAGTCTG CGGTGTTCGC GGCGCGGATC GCGGAGTGTG* AGCGGGCCCT 

25001 GTCGGCGCAT GTGGACTGGT CGCTGAGTGC GGTGTTGCGC GGGGACGGGA 

25051 GTGAOCTGTC CCGGGTGGAA GTGGTGCAGC CGGTGCTGTG GGCGGTGATG 

25101 GTCTCGCTGG CTGCGGTGTG GGCGGATTAC GGGGTCACTC CGGCTGCCGT 

25151 GATCGGGCAC TCGCAGGGTG AGATGGCTGC CGCGTGTGTG GGGGGGGCGC 

25201 T6TCGCTGGA GGATGCGGCG CGGATCGTAG CGGTACGCAG TGACGCGCTT 

25251 CGTCAGCTGC AAGGGCACGG CGACATGGCC TCGCTCAGCA CCGGTGCCGA 

25301 GCAGGCCGCT GAGCTGATCG GTGACCGGCC GGGCGTGGTC GTCGCGGCGG 

25351 TCAATGGGCC GTCGTCTACG GTGATTTCAG GGCCGCCGGA GCATGTGGCA . 

25401 GCCGTGGTCG CGGATGCGGA GGCACGTGGT CTGCGCGCCC GTGTCATCGA 

25451 CGTCGGCTAT GCCTCGCATG GCCCCCAGAT CGACCAGCTC CACGATCTGC 

25501 TGACCGAACG CCTGGCCGAC ATCCGGCCCA CGAACACGGA CGTGGCCTTC 

25551 TATTCGACGG TCACCGCCGA GCGCCTGACG GACACCACGG CCCTlSGACAC . 

2 5601 GGATTACTGG GTCACCAACC TCCGTCAGCC CX3TCCGGTTC GCCGACACCA 

25651 TCGAAGCCCT TCTCGCGGAC GGCTACCGCC TGTOCATCGA GGCCAGCGCC ** 

25701 CACCCCGTGC TGGGCCTGGG CATGGAGGAG ACCATCGAGC AGGCGGACAT 

25751 GCCCGCCACC GTCGTCCCCA CCCTCCGCCG CGACCACGGC GACACCACCC 

25801 AGCTCACCGG CGCCGCCGCC CACGCCTTCA CCGCCGGCGC CGATGTCGAC 

2S851 TGGCGGCGCT GGTTCCCGGC CGACCCCGCC CCCCGCACGA TCGATCTCCC 

25901 CACCTACGCC TTCCAGCGCC . GCCGCTACTG GCTGGCCGAC ACAGTGAAGC 
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25951 GGGACAGCGG ATGGGACCCG GCCGGGTCGG GGCATGCCCA GTTGCCGACC 

26001 GCGGTCGCCC TCGCCGACGG GGGAGTGGTG CTGAACGGCC GGGTGTCCGC 

26051 CGAGCGCGGT GGCTGGCTGG GCGGGCATGT GGTGGCGGGG ACGGTTCTGG 

26101 TGCCGGGTGC GGCGTTGGTG GAGTGGGTGT TGCGGGCCGG TGATGAGGCG 

26151 GGTTGCCCCT CGCTTGAGGA GTTGACGCTC CAGGCGCCGT T6GTGTTGCC 

26201 CGAGTCGGGT GGGTTGCAGG TTCAGGTGGT CGTGGGTGCG GCTGATGjfeC 

2 6251 AGGGCGGCCG TCGTGACGTA CATGTGTATT CGAGGTCTGA GCAGGACGCG 

26301 TCGGCGGTGT GGCAGTGCCA TCCCGTCGGT GAGCTCGGGC GCGCGTCGGT 

26351 GGCGCGGCCG GTGCGGCAGG CCGGGCAGTG GCCTCCGGCG GGGGCCGAGC " 

26401 CGGTGGAGGT GGGCGGCTTC TACGAGGGGG TC6CGGCCGC CGGTTACGAG 

26451 TACGGTCCGG CGTTCCGTGG GCTGCGCGCG ATGTGGCGGC ACGGTGATGA 

26501 CCTCCTTGCG GAGGTCGAGC TGCCGGAGGA GGCCGGTTCG CCGGCCGGTT 

26551 TCGGCATCCA CCCGGCGCTG CTGGACGCCG CCCTGCACCC GCTGCTCGCA 

26601 CAGCGGAGCC GGGACGGGGC CGGGGCGGGG GCCCACGGCG GGCAGGTGCT 

26651 GCTGCCTTTC AGCTGGAGCG GTGTTTCCCT GTGGGCCAGC GAGGCCACCA 

26701 CTGTGCGGGT GCGGGTCACC GGGCTGGGAG GAGGGGACGA CGAGACGGTG 

26751 TCCCTGACGG TAACCGACCC CGCCGGTGGC CCCGTGGTGG ACGTGGCAGA 

26801 GCTGCGGTTG CGGTCGACGA GCGCCCGGCA GGTGCGGGGT TCGGCAGGCC 

26851 CCGGCGCGGA CGGGCTCTAC GAGCTGCGGT GGACACCGTT GCCCGAGCCG 

26901 CTTCCCGTAC CGGCCCCCGC GAACGGTCGC GATGTGGCCG CCGACCTGTC 

26951 CGGATGCGCG GTGCTCGGCG AACTGGTCGC GGAACCGGGC CCGGGCATCG 

27001 ACCTGGAGGG CTGCCCCTGC TACCCGGGCG TCGGCGCGCT CGCCGACAAC 

27051 GCCTCCCCGC CCTCCATGAT CCTCGCCCCC GTGCACAGCG ACACCACAGG 

27101 CGGCGACGGA CTCGCCCTGA GGGAAGGGGT GTTGCGCGTC ATCCAGGACT 

27161 TCCTGGCTGC ACCGAOTCTG tSAACAGAAAC AGAGGCGCCT GGCGTTGGTG 

27201 ACCCGGGGCG -C-GGCGGACAC AGGTAGCACG ACGGGAGGCT CGGCTGCCGC 
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27251 GGCAGAGGCA GTCGACCCGG CGGTCGCGGC CGTATGGGGC CTAGTACGCA 

27301 . GCGCGCAGTC GGAGAACCCC GGCCGCTTCG TACTGCTGGA CACCGACGCG 

27351 CCCCTCGACC AGGCGTCCGT TGCCCCTCTC GTGGACGCGG TGCGGTCTGC 

27401 CGTGGAGGCG GACGAGCCCC AAGTCGCCCT GCGCGGGGGA CGGTTGCTCG 

I 27451 TGCCCAGGTG GGCGCGGGCC GGCGAGCCCG TCGAGCTGGC CGGGCCGGCC 

27501 GGAGCGCGGG CGTGGCGGCT GGTGGGCGGA GACTCCGGGA CGCTGGAGbc 

27551 CGTCGTGGCG GAGGCTTGCG ACGACATTGT GCTGCX3CCCG TTGGCGCCGG 

27601 GCCAGGTCCG CGTCGCCGTC C^TACGGCCG GGGTCAATTT CCGTGACGTC 

27651 CTGATCGCCC TGGGCATGTA CCCGGACCCG GACGCGCTGC CCGGCACCGA 

27701 GGCGGCCGGC GTGGTGACGG AGGTCGGGCC GGGCGTCACC CGTCTGTCGG 

27751 TGGGCGACCG CGTGATGGGC ATGATGGAC6 GCGCCTTCGG CCCGTGGGCC 

27801 GTCGCCGACG CGCGCATGCT GGCCCCGGTC CCGCCCGGCT GGGGCACCCG 

27851 GCAGGCGGCC GCCGCTCCCG CCGCGTTCCT GACGGCTTGG TACGGGCTGG 

27901 TGGAGCTGGC CGGTCTGAAG -GCGGGCGAGC GTGTGTTGAT CCATGCCGCC 

27951 ACGGGTGGTG TGGGGATGGC GGCGGTGCAG ATCGCCCGGC ATGTGGGTGC 

28001 CGAGGTGTTC GCCACCGCGA GTCCGGGCAA GCACGGCGTG CTGGAGGAGA 

28051 TGGGCATCGA CGCCGCCCAC CGCGCCTCGT CGCGCGACCT CGCCTTCGAG 

28101 GACGCCTTCC .GGCAGGCCAC CGAC6GCCGT GGCGTGGACG TCGTCCTCAA 

28151 CAGCCTCACC GGTGAACTGC TCGACGCGTC CCTGGGATTG CTCGGCGACG 

28201 GCGGGCGCTT CGTGGAGATG GGCAAGAGCG ATCCGCGCGA CCCCGAGCTG 

28251 GTCGCGCTGG AGCACCCCGG GGTGTCGTAC GAGGCCTTCG ACCTCGTCGC 

28301 . CGACGCCGGG CCCGAGCGGC TCGGGCTGAT GCTCGACAGG CTCGGCGAGC . 

28351 .TCTTCGCCGG CGGATCACTG GTACCGCTGC CGGTCACCGC ATGGGCGCTG , 

2B4Q1 . GGGCX5GGCGC GAGAGGCGCT CCGCCACATG AGTCAGGCGA -GGCACACCGG 

28451 CAAGCTGGTG CTC<3ACGTGC CCGCGCOGCT CGACCCCGAC GGCACCGTCC 

28501 TCGTCACGGG GGGTACCGGC ACCATCGGCG CGGCCGTGGC CGAACACCTG 
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28551 GCGCGTACCG- GGGAGAGCAA GCAGCTGCTC ATCGTCAGCC GCAGCGGGCC 
28601 GGCCGCCCAC GGCGCCGAGG AACTTGTCTC TCGTATAGCC GAGTTCGGGG 
28651 CCGAAGCCAC CTTCGTCGCT GCCGACGTGA GTGAGCCCGA CGCGGTCGCC 
28701 GCCCTGATCG AAGGGATCGA TCCGGCCCAT CCGCTGACCG GTGTCGTGCA 
28751 TGCCGCCGGA GTACTCGACA ACGCTCTGAT CGGCTCCGAG ACCACCGAAA 

28801 GCCTCACCCG . CGTATGGGCG GCGAAGGCCG CCGCCGCGCA GCAACTCc3lC 

2.8851 GAGGCCACGA GGGAGTCGAG GCTGGGACTG TTCGTGATGT TCTCCTCCTT 

28901 CGCCTCCACC ATGGGCACCC CAGGGCAGGC CAACTACTCC GCCGCCAACG 

28951 CCTATTGCGA CGCGCTGGCC . GCTCTCCGAC . GCGCGGAGGG GCTCGCCGGC 

29001 CTGTCCGTGG CGTGGGGGTT GTGGGAGGCC ACCAGCGGCC TGACCGGGAC 

29051 GTTGTCGGCG GCCGACCGGG CCCGCATCGA CCGGTACGGC ATCAGGCCGA 

29101 CCAGCGCGGC ACGCGGCTGC GCCCTGCTGG CAGC6GCACG CGCCCACGGG 

29151 CGCCCCGACC TGCTCGCCAT GGACCTGGAC GCCCGCGTAC CCGCCGCGTC 

29201 CGACGCTCCG GTCCCCGCCG TGCTGCGCAC TCTGGCGGCQ GCCGGAGCGC 

'29251 CCGCCACCGC CCGTCCCACC GCGGCGGCGG CCGCTGACGG GGCGACGGAC 

29301 TGGTCCGGCA GGCTCGCCGG CCTCACCGAG GAGGCACGGC TCGAACTCCT 

29351 CACCGAGTTG GTGTGCACCC ACGCGGCAGG GGTGCTCGGG CACGCCGACG 

29401 CGGGCGCGGT CGAGGTGGAC GCGCCGTTCA AGGAACTCGG CTTCGACTCG 

29451 CTGACCGCCG TCGAACTGCG CAACCGGATC GCCGCCGCGA CCGGCCTGAA 

29501 ACTGCCCGCC GCCCTCGTCT ' TCGACTACCC OCAGGCTCGC GTTCTCGCCG . 

29551 CCCACCTGGC CGAACGGCTC GTCCCGGAGG GCGCGGGGGC CATGGGCGGT 

29601 GTGAGCGGTG CGGAGGGCGT GAGG6A0GCG TACGGGGCAG GCGGTCCGGG 

29651 CGGC<5ACATG ACCGCCCAGG TCTTGCTGGA GGTGGCCCGC GTCGAGCACA . 

29701 CCCTGTCCGC CGCCGTCCCG -CACGGCCTGG ACCGGGCGGC CX3TGGCGGCC 

29751 CGCCTGGAGG CGCTGCTCGC CCGCTGCACG GCGACGACGG CGGCCACGGG 

2 9801 GGCCGCGGGA GCCGCGGTGG AGGGTGACGG CGACAGCGAC GGCGACGGCG 
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29851 CCGTGGATCA GCTGGAGACG GCCACCGCCG AGCAAGTACT GGACTXCATC 

2 9501 GACAACGAAC TCGGGGTGTG AGCCGCGTGC CGGCCGCACA CCAGGCGATC 

29951 ACGGGCGGGG AGCTGCAGCG CACATGGTGA GCGAAGAGAA ACTGGTCGAC 

.30001 TACCTCAAGC GTGTCTCCGC GGACCTGCAC GCCACCCGGC AGCGGCTGCG 

30051 CGAGGCGGAG GAGCGCGGCC AGGAACCCGT GGCCGTGGTG GAGGCCGCCT 

30101 GCCGCTACCC CGGCGGCATC CGCACCCCCG AAGACCTGTG GGACCTGGTC 

30151 GCCGCGGGCG GCAACGCCCT GGGCGCCTTC CCCGACAACC GCGGCTGGGA 

30201 CCTGCGACGC CTCTTCCACC CCGACCCCGA CCACCCGGGG ACGACCTACG 

30251 CCCGCGAGGG CGGCTTCCTC CACGACGCCG ACCTGTTCGA CCCGGAGTTC 

30301 TTCGGCATCA GCCCCCGCGA GGCCGCGGTC CTCGACCCGC AGCAGCGACT 

30351 GCTCCTGGAG TGCGCCTGGG AGGCACTGGA GCGCGCGGGC ATCGACCCGC 

30401 GGTCCCTCCA GGGCAGCCGT ACCGGCGTGT ACGCGGGTGC CGCCCTGCCC 

30451 GGCTTCGGCA GCCCGCACAT CGACCCCGCC GCCGAGGGCC ACCTGGTCAC - 

30501 CGGCAGCGCC CCGAGCGTCC TCTCGGGCCG GCTCGCCTAC ACCTTCGGCC 

30551 TCGAAGGGCC CGCGGTGACG ATCGACACCG CCTGCTCGTC GTCGCTCGTC . 

30601 GCCGTGCACC TGGCGGCCCA CGCX3CTGCGG CAGCGCGAGT GCGATCTGGC 

30651 GCTCGCGGGC GGTGTCACCG TCATGACCAC CCCGTACGTG TTCACCGAGT 

30701 TCTCGCGCCA GCGCGGCCTG GCCGCCGACG GCCGGTGCAA GCCCTTCGCG 

30751 GCCGCCGCGG ACGGCACGGC CTTCTCCGAG GGCGCCGGAC TCCTCGTACT 

30801 GGAACGCGTC TCCGACGCCC GCCGGGCCGG CGACCGGGTG CTGGCCGTCA 

30851 TCCGCGGCTC GGCCGTCAAC CAGGATGGCG CGAGCAACGG CCTCACCGCC 

30901 CCCAACGGCC CCGCCCAGCA GCGCGTGATC CGCGCCGCCC TCGCCGGGGC 

30951 GCGGCTCTCG CCCGCGGAGG TGGACGCGGT CGAGGCGCAC GGCACCGGCA 

31001 CCCGGOrGGG <:GACCCCATC GAGGCCGACG CGCTCCTCGC C^ICCTACGGT 

310S1 CAGGAGGGCC ACGGGGGCGG GCCGCTGTGG CTGGGCTCGG TGAAATCCAA • 

31101 CATCGGCCAC ACGCAGGGCG CGGCGGGTGC CGCGGGCCTG ATCAAGATGG 
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31151 TCCAGGCACT GCGGCACGAG ACGCTGCCCG CCACGTTGTA CGCCGACGAG 
31201 CCCACCCCGC ACGCCGACTG GGAGTCGGGC GCGGTGCGCC TGCTCAGCGC • 
31251 GCCGGTCGCC TGGCCGCGCG -GGGAGCACGG GGAGCACACC CGCAGGGCCG • • 
. 313 01 GCATCTCCTC. CTTCGGCATC TCCGGCACGA ACGCCCACCT CATCCTGGAG 
31351 GAGGCGCCCG CGGCCGACGC CGAAGGAGCG GGTGGCGACG GCGATGGCGA 
31401 CGGGGGAGGG GTGCGGCCGG TGGTGCGGGT CGGCGCCACG GGCCCCCGCG 
31451 AAGAGCAGGG CCAAGGACAG GGCCAAGAGC AGCACCAACA GCAACGTCAG - 
315 01 CAGCGGCAGC GGTCGTCGAT GATGCCGACG CCGCACCTCC CGTGGCTGCT 
31551 GTCCGCCCGC AGCCCCGCCG CGCTCCGCGC CCAGGCCGAC GCGCTGGCGA 
31601 ACCATGTCGC CCACGCGGAC CACTCCATCG CCGACATCGG CGGCACACTG 
31651 CTGCGCCGCA CCCTGTTCGA GCACCGGGCG GTCGTCCTCG GAACCGACCG 
31701 TGATGAGCGT GCCGCAGCGC TTGCCGCCCT CGCGGCAGGA CGCGCACACC 
31751 CCGCGCTGAC CCGGGCCGCA GGGCCGGCGA GGAACGGCGg' CACCGCCTTC 
31801 CTGTTCACCG GCCAGGGAAa CCAACGCCCA GGCATGGGCA GGCAGTTGTA 
31851 CGACACCTTC GACGTCTTCG CCGAGTCGCT CGACGAGACC TGCGCCCGGC 
31901 TCGACCCCCT GCTCGAACAG CCGCTGAAGC CCGTCCTGTT GGCCCCCGCG 
' 31951 GACACCGCGC AGGCCGCCGT GCTGCACGGG' ACCGGCATGA CGCAGGCCGC 
32001 GCTGTTCGCC CTCGAAGTCG CCCTGTACCG CCAGGTCACC TCCTTCGGGA 
32051 TCGCCCCCAG CCACCTGACC GGGCACTCCG TCGGCGAGAT CGCCGCCGCC 
32101 CACGTCGCCG GGGTGTTCTC CCTGGCGGAC GCCTGCACGC TGGTCGCGGC 
32151 CCGGGGCCGC CTCATGCAGG CCCTGGCCGC AGGTGGCGCC ATGCTCGCCG 
32201 TCCAGGCGGC CGAGGACGAC GTACTGCCGC TGCTCGCCGG GCAGGAGGAA 
32251 CGTCTCTCCC TCGCCGCCGT CAACGGCCCC ACCGCCGTCG TCGTGTCCGG 
32301 TGAGGCCGCT GCCGTCGGGG AGGTGGAGAA GGCGCTGCGC GGGCGCGGAC 
32351 TGAAGACCAA GCGGCTCAAC GTCAGTCACG CCTTCCACTC GCCGCTCATC 
32401 GAGCGGATGC TGGACGACTT CCOCGAAGTG GCCGGCGGGC TGACCTTCCA 
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32451 CGCGCCGACG CTGCCCGTCG TCTCCAACCT CACCGGCCGC CTCGCCGACG 

32501 CGGAGCTGAT GGCCGACGCC GAGTACTGGG TGCGGCACGT ACGCCGGCCG 

32551 GTGCGGTTCC ACGACGGGCT GCGCGCTCTC AGCGAGCAAG GCGTCGTGCG 

32601 CTACCTGGAG TTGGGGCCCG ACCCGGTCCT CGCCACCATG GTCCAGGACG 

32651 GTCTCCCGGC CCCGGCGGAG GGAGAGGAGC CCGAGCCGGT CGTCGCCGCG 

■ 32701 GCGCTGCGCT CCAAGCACGA CGAGGGACGC ACCCTGCTGG GTGCCGTCGC 

32751 CGCGCTCCAC ACCGACGGAC AGCCGGCCGA CCTCACCGCC CTCTTCCCCG 

32801 CCGACGCCGG GCAAGTGCCG CTCCCCACCT ACCGGTTCCA GCGGCGACGG 

.32851 TACTGGCGCG TCGCGCCCGA CGCCGCCGCG CCGGCCCGCG CCGCCGGCCT 

32901 CCAGGAGACC GGCCACCCGC TGCTGCCCGC CGTCATCCGG CAGGCCGACG 

32951 GCGGCATCCT GCTCGCGGGA CGCCTGTCCC TGCGTACGCA TCCATGGCTC 

33001 GCCGACCACA CCATCGCGGG CGGCGTCCCG CTGCCCGCCA CCGCCTTCGT 

33051 CGAACTCGCC CTGCTCGCAG GGCGGCACGC CGCCTGCGAC ACGATCGACG 

33101 ATCTGACGCT GGAGACGCCG CTGCTGCTCG ACGACACCGG TACCGGTGTC . 

33151 GGGGCGGCTG TGGGCGCGGG CGCCGATGCC CTCGTCGATG CCATAGAAGT 

33201 GCAGCTTGCC CTCGGCGCTC CCGACGGTTC CGGCCGCCGT GCTCTCACCG 

33251 TCCACTCCCG TCCTGCCGAC GATGCGGCTG ACGACGGCGA CGCGGCCGAC 

33301 GCGGCCGATG CGGCAGGCCG GGGAGGCCCG GGCGGCTCGG GTGACCTGGG 

33351 CGATCCTGGC GATCCGGGCG ATCTGGGCGA CGGCGGGGGC TCCCGCGGCT 

33401 GGCGCCGTCA CGCCACCGGC ATCCTCAGCG CCGGCCCGGC CGCCGAACCG 

33451 GCCGCCCCCG ACGCCGCTCC CTGGCCGCCC GCCGACGCCA CQGCCCTCGA 

•33501 CGTCGACGCG CTGTACGCCC GGCTCGACGC GGAGGGGTAC AGCTACGGGG 

33 551 CCGCCTTCCG GGCCGTCCAC <3CCGCCTGGC GGCACGGCGA CGACCTCTAC 

33601 <3CCGATGTCG GCCTCGCCGA CGAACAGCGC GCTGAAGCCG ACGCGTTCGC 

33651 CCTCCACCCG GCCCTGCTCG ACGCCGCCCT GCATGCCGTC- GACGAGCTGT / 

.33701 ACCGCGGCAG TGAGGGGCGG GGGCAGGAGC AGGGGCAGGG TGGTCAGGAG 
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33751 CCGGAGCAGG GCCGTGGCGA CGCGGACGCC CCGGTACGGC TGCCGTTCTC 

33801 CTTCAGCGAC ATACGCCACC ACGCCACCGG GGCCACACGG CTGTGGGTCC 

33851 GCr/ AGCCC CCAGGGCGAC GATCGGCTGC GGCTGTCCCT GACCGACGGC 

33901 GAGGGCGGGC AGGTCGCGAC AGTCGACGCC CTCCAACTGC GGTTGATCCC 

33951 CGCCGACCGG TGGCGCGCGG CCCGCCCCAC CACAGCCGCC CCCCTGTACC 

34001 ACCTGGACTG GCACGAGCTG CCGTTGCCCG AGCCGGCCGA GACGGACCCG 

34051 GCCGCCCACT CCTGGGCTGT GCTCGGAGCG CACGACGCGG GCCTCGCTCC 

34101 CGCCGCGCAC TACCCGGACC TGGCGGCCCT GAAAGCCGCC GTCGAGGCCG 

34151 GCGAGCCCGT GCCGGACATC GTCTTCGCAC CGTTCCCCGC GCAGGGGACG 

34201 GAGACCGATG TCCCGGCTCA GGTACGAGCC CACGCCCGGC ACGCCCTGGA 

34251 GCTGCTGCGC GACTGGCTCA CCACGGAAGC TTTCGCCGCC GCCCGCCTCG 

34301 TCGTCCTCAC GACCGGTGCG GTCACCGCCC GCCCAGAGGA CGGGCCCGCC 

34351 GACCTGGCCA CCGCACCTGT ATGGGGCCTG GTCCGAGCCG CCCAGGCCGA 

. 34401 ACAACCCGAC CATGTCGTCC TGGTGGACAT CGACAAGGAC ATCGATAAGG 

34451 ACACCGACGA GGAGACCGAC CAGGCCACCG ACGCGGGCAC CGCATCGCGC 

34501 CACGCTCTGC CCGCCGCCTT GGCCGCGGCG GCCGCCCAAG CCGAGACACA . 

34551 GCTCGCCCTG CGCGCGGGCA CCGTGCTCGT GCCGCGCCTC GCCGTCGTCC 

34 601 CGCCCCGGAC CGACACCCCA GCGCTGCACG CCACCGCCCC GGAGAGCACC 

34 651 ACGGACACTG TGGACTCCAC GGGCATCGCG GGCGCTGCGG AATCCGGCGG 

34701 CACCGTCCTG ATCACCGGCG GAACCGGCGG CCTCGGGCAG GCCGTCGCCC 

347 51 GTCACCTCGC CGCCGCGCAT GGCGCCCGCC ACCTGCTCCT CGTCAGCCGC 

34 801 AGGGGCGACG CCGCCGAGGG CGTCGCCGAG TTGCGCGCCG ACCTCGCGGA 

34851 CGACGGCGTC GACGTACGCG TCGCCGCCTG CGACATCACC GACCX3CGACG 

34901 CGCTGGCCGG ^CTCCTCGCG GACATCCCCG CCGCGCACCC GCTCACCGCG 

34951 GTCGTGCACA CCGCGGGGGT CATCGACGAC AGCCTCATCA CGGCGATGAC 

3S001 CCCCGAGCGG CTCGACGOCG TCCTCGCAGC CAAGGCCGAC GCGGCGTGGC 
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35051 ACCTGCACGA ACTCACCCGC GACAAGGACC TGTCGGCCTT CGTCCTCTTC 

35101 TCCTCGGGCG CCTCCGTCCT CGGCAACGGC GGCCAGGCCA ACTACGCGGC 

35151 \CGCCAACACC TTCCTCAACA CCCTCGCCGA ACACCGCCGC GCGGCCGGCC 

35201 TCGCCGCCAC CTCCGTGGCC TGGGGCCTGT GGGAGTCCGC GTCCGGCGGC 

35251 ATGGCCGCCC GGCTCGGCGA CGCCGACCGC GCCCGCATCC ACCGCACCGG 

35301 CGTGACGGGC CTGACCGACG AGCAGGCCCT GGCCCTCTTC GACGCGCdcC 

35351 ■ TGACCGCCGA GCACCCCACG GTCCTGGCCA CCCGCTTCGA CCGCGCCGTG 

35401 CTGCGCGGCC AGGCCGCCGC CCGCACCCTG CAGCCCGCCC TGCGCGGCCT 

35451 GGTACGCACT CCGCGCCCCA CCGCGTCCGC CGGGGCCATC GGGTCCACCG 

35501 CAGCCACCGG GTCCGCCACG GACGAGAACG CGCCCTCCTC GTGGGCCGCC 

35551 CGGCTCGCCC GGCTGTCCGC CGCCGACCGC GACCGCGCCC TCAACGAACT 

35601 CATTCGCGAG CAGATCGCGA CCGTCCTGGC ACACCCCTCA CCCGACACCA 

35651 TCGAACTGGG CCGCGCCTTC CAGGAGTTGG GCTTCGACTC GCTCACCGCC 

35701 CTGGAACTCC GCAACCGCCT CTCCACGGCC ACCGGCATCC GGCTGCCCGC 

.35751 CACCCTCGTC TTCGACCACC CGAGCCCCAC CGCCCTCGTA CGCCATCTCC 

35801 ACAGCCATCT CCCCGACGAG GCCCAGCACA CGTCCCCGAC CGCCCCCGGC 

35851 GCCTCTGCGG AGGGCACCGC CGCCACGG.CC ACCGGCATCG ACGACGACCC 

35901 GATCGCCATC GTCGGCATGG CGTGCCGCTA CCCGGGCGGC GTGACCTCGC 

35951 CCGAGCAGCT GTGGCAGCTC GTGGCCACCG GCACCGACGC CATCGGCCCG 

36001 TTCCCCGAGG ACCGCGGCTG GGACACGGCC GGACTGTTCG ATCCCGACCC 

36051 CGACCAGGTC GGCCACAGCT ACACCCGCGA AGGCGGCTTC CTCTACGACG 

36101 CCGCCCGCTT CGACGCGGGC TTCTTCGGCA TCAGCCCGCG CGAGGCCGCC 

36151 GCCACCGACC CGCAGCAGCG CCTGCTCCTG GAAACCGCCT GGCAGGCGTT 

. 36201 CGAACACGCG <3GCATCGACC CCGCCGCCCT GCGCGGCACC CCGTGCGGCG 

36251 TCATCACCGG AATCATGTAC jGAGGACTACG GATCCCGCTT CCTCGCGCGC . 

3*6301 AAACCGGACG GCTTCGAGGG CCGCATCATG ACCGGCAGCA CGCCGAGCGT 
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3 6351 GGCCTCCGGC CGGGTCGCGT ACACCTTCGG CCTGGAGGGC CCCGCCATCA 
3 6401 CGGTGGACAC CGCGTGCTCC TCCTCGCTGG TCGCGATGCA CCTGGCGGCG 
36451 CAGGCGCTGC GGCAGGGCGA GTGCGAACTG GCCCTGGCCG GGGGTGTGAC 
36501 CGTGATGGCC ACCCCGAACA CCTTCGTGGA GTTCTCCCGC CAGCGCGGCC 

36551 TGGCCCCCGA CGGCCGCTGC AAGCCGTTCG CCGCCGCGGC GGACGGCACC 

3 6601 GGCTGGGGCG AGGGCGCCGG ACTCGTCGTC CTGGAGCGCC TCTCCGACGC 

36651 GCGCCGCAAG GGACACCGCG TCCTCGCCCT GCTGCGCGGT TCGGCCGTGA 

36701 ACCAGGACGG CGCGAGCAAC GGCATGACCG CCCCGAACGG TCCCTCGCAG 

36751 GAACGGGTCA TCCGCACCGC CCTGGCCGGC GCGGGCCGTG GTCCCGAGGA 

36B01 CATCGACGTG GTGGAGGCGC ACGGCACCGG CACCACGCTC GGCGACCCGA 

36851 TCGAGGCGCA GGCCCTGCTC GCCACGTACG GGCAGGGGCG CCCGGAGGAC 

36901 CGCCCGCTCT GGCTCGGCTC GGTGAAGTCG AACATCGGCC ACACGCAGGC 

36951 CGCCGCCGGT GTCGCGGGCG TCATCAAGAT GGTCATGGCA CTGCGCCACG 

37001 AGCAACTGCC GACGACCCTG CACGCCGACG AGCCGACCCC CCACGTGCAA 

37051 TGGGACGGCG GCGGCGTACG TCTCCTGACC GAACCGGTCC CGTGGTCGCG 

37101 CGGCGAGCGC ACGCGGCGCG CCGGGGTGTC GTCCTTCGGG ATCTCCGGGA 

3 7151 CGAACGCGCA CGTGATCCTG GAGGAGCCGC CGGAGGAGGA CCTGCCCGAG 

37201 CCCGTGGCGG CGGAGCCGGG TGGGGTGGTG CCGTGGGTGG TGTCCGGGCG 

37251 GACGCCGGAC GCGTTGCGTG AACAGGCGCG GCGGCTCGGC GAGTTTGTCG 

37301 TCGGTGCCGG GGATGTGTCG . GCAGCCGAGG TGGGATGGTC ACTGGCCACG 

37351 ACGCGGTCGG TGTTCGAGCA CCGGGCCGTG GTGGCGGGCC GGGACCG6GA 

37401 CGATCTGGTT GCCGGGATGC AGGCGCTGGC GGCAGGGGAG ACGCCGACAG . 

37451 ATGTCGTGTC CGGTGCGGCG dCTTCCTCCG GTGCGGG^CC GGTGTTGGTG 

37501 TTCCGGGGGC AGGGGTCGCA CTGGGTGGGC ATGGGTGCCC AGCTCCTTGA 

37551 OGAGTCCCCC GTCTTCX3CGG CGGGGATGGC GGAGTGTGAG CAGGCGCTGT 

37601 CGGCGTACGT -GGACTGGTCG CTGAGTGATG TCCTGGGCGG GGACGGGAGT 
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37651 GAGCTGTCCC GGGTCGAGGT CGTGCAGCCC GTGTTGTGGG CGGTAATGGT 

37701 CTCGCTGGCT GCCGTCTGGG CGGATTACGG GGTCACTCCG GCCGCTGTGG 

37751 TGGGGCATTC GCAGGGTGAG ATGGCTGCCG CGTGTGTGGC GGGGGCGCTG 

37801 TC6CTGGAGG ATGCGGCGCX3 GATTGTGGCG GTACGCAGTG ACGCGCTTCG 

37851 TCAGCTGCAA GGGCACGGCG ACATGGCCTC ACTCGGCACT GGTGCCGAGC 

37901 AGGCCGCTGA GCTGATCGGT GATCGGCCGG GAGTGGTCGT CGCGGCAGTC 

37951 AACGGGCCGT CGTCTACCGT GATTTCGGGG CCGCCGGAGC ATGTGGCCGC 

38001 TGTGGTCGCG GAGGCGGAGG CACGTGGTCT GCGCGCCCGT GTGATCGACG 

38051 . TCGGGTATGC CTCGCACGGC CCCCAGATCG ACCAGCTCCA CGACCTCCTC 

38101 ACCGAGGGCC TGGCTGACAT CCGGCCCGCG AACACGGACG TGGCCTTGTA 

38151 TTCGACGGTC ACCGCCGAGC GCCTGACGGA CACCACAGCC CTGGATACGG 

38201 ATTACTGGGT GACCAACCTC CGCCAGCCGG TCCGGTTCGC CGACACCATC 

38251 GAAGCGCTTC TCGCGGACGG CTATCGCCTG TTCATCGAGG CCAGCGCGCA 

38301 CCCGGTGTTG GGCCTGGGCA TGGAGGAGAC CATCGAGCAG GCGGACATCC 

38351 CTGCCACGGT CGTCCCCACC CTGCGCCGCG ACCACGGCGA CACCACCCAG 

38401 CTCACCCGCG CCGCCGCCCA CGCCTTCACC GCCGGCGCCG ATGTCGACTG 

38451 GCGACGCTGG TTCCCGGCCG ACCCCACCCC CCGTACCGTC GACCTCCCCA 

38501 CCTACGCCTT CCAGCACCAG CACTACTGGC T6GAGGAGCC CAGTGGGCTC 

38551 ACCGGAGACG CCGCCGACCT CGGCATGGTG GCCGCCGGGC ATCCGCTGCT 

38601 GGGTGGCTGT GTGGAACTCG CGOAGAGCGA -CTCGfACTTG TTCACCGGGC 

38651 GGCTCTCGCG CAGGGCTCCG TCCTGGCTGG CCGAAGACGT GGTGGCGGGG 

38701 ACGGTTCTGG TGCCGGGTGC GGCGTTGGTG GAGTGGGTGC TGCGGGCCGG 

38751 CGATGAGGCG GGATGCCCGA CGATTGAGGA ACTGACGCTC CAGGCGCCGT 

38801 TGGTGCTGCC CGAGTCGGGC GGGTTGCAGG TTCAGGTGGT CGTGGGTGCG 

38851 ACCGATGAGC AGAGCX3GCCG TCGTGACGTA CACGTGTATT CGAGGTCTGA 

38901 <5CAGGACGCG TCGGCGGTGT GGGTGTGCCA TGCCGTCGGT GTX3GTGAGCT " 
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3 8951 CCGAAATGCC AGAAGCGGCA GCCGAGTTGA GTGGGCAGTG GCCTCCTGCC 

39001 GGGGCCGAAG CCGTGGATGT CGAGGACTTC TACGCGCGGG CCGCGGAGGC 

39051 C6GATACGCC TACGGTCCGG CGTTCCAGGG GCrGCGGGCG CTGTGGCGGC 

39101 ACGGGACGGA GCTGTTCGCC ' GAGGTGGTGC TGCCCGAACA GGCGGGTGGG 

39151 CACGACGGTT TCGGCATCCA CCCGGCGCTG CTGGACGCCG CCCTGCATCC 

39201 GCTGATGCTC CTCGACCGGC CCGCGGACGG GCAGATGTGG CTGCCGTTCG 

39251 CGTGGAGCGG GGTGTCGCTG AACGCGGACC GGGCGACCCA CGTCCGTGTC 

39301 CGGCTCTCCC CGCGGGGGGA GGCGGCCGAG CGTGACCTGC GGGTCGTCAT 

39351 CGCCGACGCG ACCGGGGCGC CCGTCCTGAG GGTCGACGCC CTGACCCTGC 

39401 GCGCGGCCGA TCCCGGCCGG CTGGGTGCGG CGGCCCGTGG C6GTGTCGAC 

39451 GGCCTCTACA CCGTCGACTG GACCCCGCTG CCCCTGCCCC AGCCCCTTCC 

39501 GCTGCCGCGG ACGGATGCAG GGGGGAGTGC CGACTGGGTC ATACTCTCGG 

39551 ACAACTCCAG TGCAGCTCTG GCTGATGCGG TGTCGTCCGC GACGGCGGCA 

39601 GGTGGCGGAG CGCCGTGGGC ATTGCTCGCT CCCGTGGGTG GCGGCTCTGC 

39651- CGATGACGGG CTGCCGGTGG TGCGGCGGAC CCTCTCCCTC GTACAGGAGT 

39701 TCCTGGCCGC CCCGGAGCTG ACCGAGTCCC GTCTCGTCAT CGTGACACGC 

39751 GGTGCCGTGG CCACCGACGC CGATGGTGAC GTCGCGGCGT CCGCGGCAGC 

39801 GGTATGGGGC CTGATCCGCA GCGCCCAGTC GGAGAACCCG GGCCGCTTCG 

39851 TCCTGCTCGA CGTCGAGGAG GAGCACCTCC ACCCGGACGG CGGGGAACTG 

39901 CCGTACGCCG CCCTGCGCCA CGCCGTAGAG GAGCTCGACG AGCCTCAACT 

39951 TGCCCTCCGC AGCGGCAAAT TCCTCGTACC GCGCATGACG CCCGCCGCCG 

40001 CCCCCGAGGA GCTCGTCCCG CCGGTCGGTA CGTCCGGCTG GCGCCTCGGC 

40051 . ACCTCCGGTA CGGCCACCCT GGAGAATCTG TCGGTGATCG ACGCTCCCGA 

40101 GGCGTTCGCX3 CGGCTGGAGC CCGGGCAGGT GCGGATCTCC GTAOGGGCGG 

40151 CGGGCATGAA CTTCCXSTGAC GTGCTGATCG CGTTGGGCAT GTATCCCGAC - 

40201 AAGGGCACGT TGGCGGGAAG CGAGGGCGCC GGACATGTGA CGGAGGTGGG 
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40251 ACCGGGCGTC ACTCATCTGT CGGTCGGTGA CCGGGTGATG GGTCTGTTCG 

40301 AGGGCGCGTT CGCTCCGCTG GCCGTCGCGG ACGCCCGGAT GGTCGTCCCG 

40351 ATTCCGGAGG GCTGGAGCTT CCAGGAGGCC GCGGCGGTGC CCGTGGTGTT 

4 0401 CCTCACGGCC TGGTACGGCC TCGTGGACCT CGGCCGCCTC CGGGCGGGCG 

40451 AATCGCTGCT CATCCACG.CG GGCACCGGCG GAGTGGGCAT GGCCGCCACC 

40501 CAGATCGCCC GCCACCTGGG CGCCGAGGTG TTCGCCACCG CGAGCCCCGC 

40553, CAAGCACGGC GTGCTCGACG GCATGGGCAT CGACGCGGCC CACCGCGCCT 

4 0601 CCTCCCGTGA CCTCGACTTC GAGGAGACCT TGCGGGCGGC GACGGGCGGG 

40651 CGCGGCATGG ACGTCGTACT CAACAGTCTG GCCGGGGAGT TCACCGACGC * 

40701 CTCGCTGCGG CTGCTCGCCG AGGGCGGGCG CATGGTGGAC ATGGGCAAGA 

40751 CCGACAAGCG CGACCCCGAC CGGGTCGCGG CCGAGCACGC GGGCGCGTGG 

40801 TACCGGGCCT TCGACCTCGT GCCGCACGCG GGGCCCGACC GGATCGGGGA 

40851 AATGCTGGCG GAGCTGGGCG AGTTGTTCGG CTCCGGCGCC CTGGCGCCGC 
40901 - TGCCGGTCCA GACCTGGCCG CTGGGCCGGG CGCGTGAGGC GTTCCGGTTC 

40951 ATGAGCCAGG CGAAGCACAC CGGCAAGCTG GTGCTGGAGA TCCCGCCCGC 

41001 CCTCGATCCG GACGGCACGG TGCTCATCAC CGGCGGCACC GGGGTCCTCG 

41051 CCGCCGCGGT GGCCGAGCAT CTGGTGAGGG AGTGGGGCGT ACGACACCTG 

41101 CTGCTGGCCG GQAGGCGCGG TTCCGAGGCG CCCGGGAGCA GTGAACTCGC 

41151 CGAGGAACTG ACCGAGTTGG GGGCCGAGGT GACCTTTGCC GCGGCCGATG 

41201 TCAGTGATCC GGAGGCCGTG GCX3GAGCTCG TCGGCAAGAC CGATCCGGCG 

41251 CACCCGCTGA CCGGTGTGAT CGACGCGGCC GGTGTGCTGG ACGACGCCGT 

41301 GGTCACCGCA CAGACCCCGG AGAGCCTCGC <3CGGGTGTGG GCGGCGAAGG ' 

41351 CGACGGCCGC ACACCTGCTG CACGAGGCGA CCCGGGAGGC GCGCCTCGGT 

41401 CTCTTCCTGG TGTTCTCCTC -GGCGGCGGCG ACACTCGGCA GTCC<3GGACA 

414S1 jGGCCAACTAC <5CGGCGGCCA aggcctattg cgacgccctc. gtccggcaac 

41501 <3GCGTGCOGA OGGCCTGGCC GGTCTCTCGA TCGGCTGGGG TCTGTGGCAG 
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41551 ACGGCGAGCG GCATGACCGG ACACCTCGGC GAGACGGACC TGGCACGCAT 

41601 GAAGCGCACC GGGTTCACCC CGCTGACCAC CGAAGGTGGC TTGGCCCTCC 

41651 TCGACGCCGC CCGCGCCCAC GGCCGCCCGC ACGTGGTCGC GGTGGACCTC 

41701 GACGCGCGCG CCGTCGCCGC GCAGCCCGCC CCGTCCCGGC CCGCGCTCCT 

41751 GCGCGCCCTG GCCGCGGGT0 .CGACCCCGGG GGCCCGCACC GCCCGGCGCA 

41801 CCGCGGCCGC GGGCAGCGTC GCCCCGGCGG GCGGTCTCGC CGACCGGCTG 

41851 GCCGGCCTGC CGCATCCCGA ACGGCGCCGG CTGCTGCTCG ACCTCGTACG 

41901 TGGCAACGTC GCCGGCGTCC TCGGGCACAG CGACCACGAC GCCGTGCGCC 

41951 CGGACACGTC GTTCAAGGAG CTCGGCTTCG ACTCCCTGAC CGCCGTGGAA 

42001 CTGCGCAACC GGCTGGCCGC CGGCACCGGC CTGAAGCTGC CCGCGGCGCT 

42051 CGTCTTCGAC TACCCCGAGT CGGCCACCCT CGTCGACCAC CTCCTGGAGC 

42101 GTCTGTCGCC CGACGGCGCG CCGCCGCCCG TCAAGGACGC CGCGGACCCC 

42151 GTTCTCAACG ACCTCGGCAG GATCGAGTCC TCCCTGGACG CGCTCGCCCT 

42201 CGACGCGGAC GCGCGCAGCC GGGTCACCAG GCGTCTGAAC ACCCTGCTGT 

42251 CGAAGCTGAA CGGAGCCGCC ACCGCCGGCT CCCCGGCGGA CGTCACGGAC 

42301 CTGGACGCGC TGGACGCGCT GGACGACGTG TCCGACGACG AGATGTTCGA 

42351 GTTCATCGAC CGAGAGCTGT GACCCCCCTG CCCGCCCCGT CCCCCTTCCC 

42401 CGCCCCCACG TTCCCCGTGC* CCTTCGCTGA TGGAGAAGTG ACGTTCGATG 

42451 TCGAGTGCTG AAGAGTCGAG TCCTGATGTG TCCGGCACGG GTGTGTCCGG 

42501 TACGGGAGAG TCCGCTACGG GTACGTCGAG TACGGAAGCC AAGCTTCGGC 

42551 AGTATCTGAA GCGGGTCACG GTGGACCTCG GCCAGGCCCG CCGGCGGCTG 

42601 CGCGAGGTGG AGGAGCGGGC CCAGGAGGCG ATCGCCATCG TCTCCATGGC 

42651 GTGCCGCTTC CCCGGCGACA CCCGCACGCC CGAGGCGCTG TGGGACCTGG 

42701. TCGCCGAGGG CGGCGACGCC ATCGACGACT TCCCCACCAA TCGCGGCTGG 

42751 GACCTGGAGA GCCTCTACCA CCCCQACCCC GACCACCCCG GCACCAGCTA 

42801 CGTCCGACGC GGC<3GGTTeC TGTACGACGC CCCCGCCTTC GACGCGTCGT 
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42 851 TCTTCGGGAT CAGCCCGCGC GAAGCCCTGG CCATGGACCC GCAGCAGCGG 

42901 GTGCTCATGG AGACGGCCTG GCAGCTCCTG GAGCGGGCCG GCATCGACCC 

42951 GGCCTCGCTG AAGCTGAGCG CCACCGGCGT CTACATCGGC GCGGGCGTGC 

43001 TCGGGTTCGG CGGCGCGCAG CCCGACAAGA CGGTAGAGGG CCACCTCCTG 

43051 ACCGGCAGCG CGCTGAGTGT CCTGTCCGGC CGCATCTCCT TCACGCTCGG 

43101 CCTCGAGGGC CCGTCGGTCA GTGTCGACAC GGCGTGCTCC TCCTCGCT6G 

43151 TCTCCATGCA CCTGGCGGCC CAGGCGCTGC GGCAGGGGGA GTGCGATCTC 

43201 GCGCTGGCCG GCGGTGTCAC CGTGATGTCG ACGCCCGGCG CGTTCACCGA 

43251 GTTCTCCCGC CAGGGCGCGC TGTCTCCX3GA CGGCCGCTCG AAGGCTTTCG 

433 01 CGGCCTCGGC CGACGGCACC GGTTTCTCGG AGGGCGCGGG ACTGCTCCTC 

43351 CTGGAGCGGC TCTCCGACGC GCGCCGCAAC GGCCACAAG6 TGCTCGCGGT 

43401 GATCCGCGGC TCGGCCGTCA ACCAGGACGG CGCGAGCAAC GGTCTCACCG 

43451 CCCCCAACGG CCCCTCCCAG GAACGCGTGA TCCGCGCCGC CCTCGCCAAC 

43501 GCGGGCCTGG GCGCCGCCGA GGTCGACGCG GTCGAGGCAC ACGGCACCGG 

43551 CACGAAGCTC GGCGACCCCA TCGAGGCCGG TGCGCTGCTC GCCACCTACG 

43601 GCCGCGACAG GGACGAGGAC CGGCCGCTGT GGCTGGGCTC GGTCAAGTCG 

43651 AACATCGGTC ACCCGCAGGG CGCAGCAGGC GTCGCGGGCG TCATCAAGAT 

43701 GGTGATGGCG CTGCAGCGCG AACTGCTCCC . CGCCACCCTG TACGTCGACG , 

43751 AGCCCACCCC GCACGTCGAC TGGTCCTCGG GCTCCGTCAG GCTCCTCACC 

43801 GAAGCGGTCC CGTGGACCCG CGGCGAGCGC CCGCGCCGCG CGGGCGTGTC 

43851 CGCCTTCGGC ATGTCCGGGA CGAACGCCCA CGTGATCCTG GAGGAGGCAC* 

43901 CGCCCGAGGA GGCAGCGGCC GCGGAGACAC CGGCGGAAGG GACAGGCGCA 

43 951 GTCGTCCGGT GGGTCGTCTC CGGCCGGGGC - GAGGAAGCGC TGCGGGCCCA • 

44001 GGCCGCACAG CTGGCCGAGC ACGTGCGCGA . CGACGACCAG GGGCCGGCGT 

44051 CACCGCTGGA OGTGGGGTGG TCGCTGGCCA CGACACGGTC GGTGTTCGAG 

44101. AACC?GGGCCG TCGTCGTCGG GGACGACCGC QACGCGCTCC TCGACGGCCT 
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44151 CCGGTCGCTG GCGGCAGGTG AGGCGTCGCC GGACGTGGTG TCCGGGGCGG 

44201 TCGGCCCCAC GGGGCCCGGG CCGGTCATGG TGTTCCCCGG CCAGGGCGGC 

44251 CAGTGGGTGG GCATGGGGGC CCGGCTCCTC GACGAGTCCC CGGTGTTCGG 

44301 GGCCCGGATC GCCGAGTGCG AGCAGGCCCT GTCGGCGTAC GTGGACTGGT • 

44351 CCCTGACCGA C6TGCTGCGC GGGGACGGGT CGGAGCTGGC CCGGATCGAC 

44401 GTCGTCCAGC CCGTGCTGTG GGCCGTCATG GTCGCGCTCG CCGCCGTCTG 

44451 GGCGGACCAG GGAATCGAAC CCGCCGCCGT CGTCGGCCAC TCGCAGGGCG 

44501 AGATAGCCGC GGCGTGCGTC GTGGGCGCCA TCTCCCTGGA CGAGGCGGCC 

44551 ■ CGCATCGTCG CCGTACGCAG TGTGCTGCTG CGGCAGCTGT CCGGACGCGG 

44 601 CGGCATGGCG TCCCTGGGGA TGGGCCAGGA GCAGGCCGCC GACCTGATCG 
44651- ACGGACACCC GGGTGTGGTC GTCGCGGCCG TCAACGGGCC GTCGTCCAGC • 

44701 GTCATCTCGG GCCCGCCCGA GGGCATCGCC GCCGTCGTCG CCQACGCCCA 

44751 GGAGCGGGGC CTTCGCGCCA GGGCCGTCGC CTCCGACGTC GCGGGCCACG 

44801 GCCCGCAGCT GGACGCGATC CTGGACCAGC TCACGGAGGG CCTGGCCGGC 

44851 ATCCGGCCCG CCGCGACCGA CGTCGCGTTC TACTCCACCG TCACCGCCGG 

44901 GCACCTCACC GACACCACCG AACTCGACAC ■ CGCGTACTGG . GTGCGGAACG 

44 951 TGCGCCGGAC GGTGCGTTTC GCCGACACGA TCGACGCGCT GCTCGCGGAC 

45001 GGGTACCGCC TGTTCATCGA GGTGAGCCCC CACCCCGTCC TCAACCTCGC 

45051 GCTGGAAGGC CTCATCGAAC GGGCGGCCGT GCCCGCCACG GTCGTGCCCA 

45101 CCCTGCGCCG CGACCACGGC GACACCAGCC AGCTCGCCCG CGCCGCGGCC 

45151 CACGGCTTCG CCGCCGGCGC GGACGTGGAC TGGCGGCGCT GGTTCCCGGC 

45201 CGACCCCGCC CCCCGTACCG TCGACCTGCC CACCTACGCC TTCCAGCGCC 

45251 AGGACTTCTG GCCGGCGCCC GCCGGCGGGC GGTCCGGCGA CCCTGCCGGG 

453 01 CTCGGCCTCG -CCGCCTCCGG ACACCGGCTC CTGGGCGCCT CCGTGGGCCT 

45351 CGCGAGCGGG GACGTACACC TGCTGAGCGG <3CGGGTGT-CC CGGCAGTCCG 

45401 CCGCGTGGCT GGACGACCAC GTCGTGGCGG GCCAGGCCCT GGTGCCCGGC 
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45451 GCGGCGCAGG TGGAGTGGGT GCTGCGGGCC GGCGACGACG CGGGCTGCTC 

45501 CGCCCTGGAG GAGCTGACGC TCCAGACGCC GCTCGTGCTG CCCGACACCG 

45551 GCGGCCTGCG GATCCAGGTC GTCGTCGAAG CGGCCGACGC ACACGGCCGG 

45601 CGCGACGTCC GGCTGTTCTC CCGCCCCGAT GACGACGACG CCTTCGCGTC 

45651 GACGCACCCC TGGACCTGCC ACGCCACGGG CGTGCTCGCC CCCGCCCCGA 

45701 CGGACGGCAC CAACGGAACG CGGGACGCCG CCGACACCCT GGACGGCGCA 

45751 TGGCCCCCGG CCGACGCCGA ACCCGTCCCC GCCGACGACC TCTACGCGCA 

45801 GGCCGACCGC ACCGGATACG GCTACGGCCC CGCCTTCCGG GGCGTACGGG 

45851 CGCTGTGGCG CCACGGCAAG GACGTCCTGG OCGAGGTGAC GCTGCCCAAG 

45901 GAGGCCGGCG ACCCGGACGG CTTCGGTATC CACCCGGCCC TCCTCGACGC 

•45951 CGTCCTGCAA CCCGCCGCAC TGCTGOTGCC CCCGACCGAC GCCGAACAGG 

46001 TCTGGCTGCC GTTCGCCTGG AACGACGTGG CGCTGCACGC CGTACGGGCC 

46051 ACCACGGTCC GGGTGCGCCT CACCCCGCTC GGGGAGCGGA TCGACCAGGG 

46101 GCTGCGCATC ACCGTGGCCG ACGCCGTGGG CGC6CCCGTG CTCACCGTCC. 

46151 GCGACCTGCG CTCGCGCCCG ACCGACACAG GCCGCCTCGC CGCGGCCGCG . 

46201 ACCCGCGACC GGCACGGGCT GTTCGACCTG GAGTGGATCG CGCCGGA6AA 

46251 CGCGGCGGAG AACGCXBGCGG GTCCGGCCCG GGACGCGTCC GAAGGGTGGG 

46301 TGACACTCGG CGAGGACGCC GCGAGCCTCG CGGACCTGCT GGCGTCCGTC 

46351 GAGGCGGGCG CTCCGGCGCC GCAGCTCGTG GCCGCCCCCG TCGAACCCGA 

46401 CCGGACCGAC GACGGCCTGG CACTCGCCAC CCACGTCCTC GACCTCGTAC 

46451 AGACCTGGCT CGCCTCGCCC CTGCACGACT CCCGCCTGGT CCTGGTGACG 

46501- CGAGGGGGAG TGACGGATGC GGATGTGGAT GTGGCTGCCG CGGCCGTTTG 

46551 GGGTCTGGTA CGCAGCGCCC AGTCGGAGCA CCCCGGCCGC TTCACGCTGA 

46601 TCGACCTCGG CCCCGACGAC ACGCTTGCCG CAGCCATGCA GGCGGCGCAC 

46651 CTGGAAGAGC CGCAACVGGC GGTCCACGGC GGCGAGATAC GAGTGCCGCG - 

46701 ACTGGTCCGC GCCAGGACCG ACCCGACCGC CCCGAACGGG ACACCGGAGG 
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46751 CCGACCGGAC GGCGGACCCG TCCGAAGGAC TCCACCGGAA .CGGTACGGTT 
46801 CTCATCACCG GCGGCACCGG CGTACTCGGC CGACTGGTGG CCGAACACCT 
46851 GGTCACGGAG TGGGGCGTAC GCCACCTGCT GCTCGCGAGC CGACGCGGCG 
46901 ACCAGGCGCC GGGTAGCGCC GAACTCCGCG CCCGCCTGAG CGAATTGGGA 
46951 GCATCGGTCG. AGATCGCCCC GGCCGATGTC GGCGACGCGG AAGCGGTCGC 
47001 CGCACTGATC GCGTCGGTCG ACCCGGCGCA CCCGCTCACC GGTGTGATCC 
47051 ACGCGGCCGG TGTCCTGGAC GACGCCGTGA TCACCGCCCA GACCCCCGAG 
47101 AGCCTCGCGC GGGTGTGGGC GACGAAGGCG ACGGCGGCCC GCCATCTGCA 
47151 CGAGGCGACA CGGGAGACAC CCCTCGACTT CTTCGTGGTG TTCTCCTCGG 
47201 CGGCCGCCTC GCTCGGCAGC CCCGGCCAGG CCAACTACGC GGCGGCCAAC 
47251 GCCTATTGCG ACGCCCTCGT CCAGCACCGC CGCGCCCAAG GGCTCGCGGG 
47301 CCTCTCGATC GCCTGGGGCC TGTGGCAGGC GACCAGCGGC ATGACCGGGC 
47351' AGCTGAGCGA GACCGACCTG GCGCGCATGA AGCGCACCGG GTTCGCCGCG 
47401 CTGACCGACG * AGGGCGGCCT GGCCCTGCTC GACGCCGCCC -GTGCCCACGA 
47451 CCGGGCCTAC GTGGTCGGGG CCGACCTCGA CCCGCGCGCC GTGACCGATG 
47501 GCCTGTCCCC GCTCCTGCGC GCCCTCACGG CGCCCGCCAC GCGGCGGCGC 
47551 GTGGCCTCCG AAGGCCTCGC CGACGGGGCG CTCGCGACCC GCCTGGCCGG 
47601 CCTCGACGCG GACGGCCGCC TAAGGCTCCT CACCGATGTC GTACGCGAGT 
47651 ACGTCGCGGC CGTCCTCGGC CATGGTTCCG CCGCCCGGGT GGGCGTCGAC 
47701 ATCGCCTTCA AGGACCTGGG TTTCGACTCG CTGACCGCGG TGGAGCTGCG 
47751 CAACCGGCTG TCGGCCGCCT GTGACGTGCG GCTGCCCGCC ACACTGATCT 
47801 TCGACCACCC CACCCCGCAG GCTCTCGCCA CCCACCTGGT GGACCGCTTG 
47851, GCGGGCAGCA CCTCCGCGAC CACGACGGTG AATGCGACGG CGCCGGCAGC . 
47901 CGCCCACGTC GCCGCAGGGG CCGACGTCGA CGCAGACACC GACGACCCGG 
47951 TCGCCATGGT CGCCATGACO TGCCGGTTCC CGGGCGGCGT- CGCGTCCCCG 
48001 OACGACCTGT GGGACCTGCT CGACGCACGC AAGGACGCGA TGGGCGCCTT 
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4 8051 CCCCACCGAC CGCGGCTGGG ACCTGGAACG CCTCTTCCAC CCCGACCCGG 

48101 ACCACCCCGG CACCAGCTAC ACCGACCAGG GCGGATTTCT TCCCGACGCG 

48151 GGTGATTTCG ATGCGGCGTT CTTCGGGATC AATCCGCGGG AGGCGCTGGC - 

48201 GATGGATCCG CAGCAGCGGT TGTTGCTGGA GGCGTCGTGG GAGGTGTTGG 

48251 AGCGTGCGGG TATCGATCCG ACGACGCTCA AGGGCACCCC GACCGGCACC 

48301 TACGTGGGCC TCATGTACCA CGACTACGCC AAGTCCTTCC CCACGGCCGA 

48351 CGCCCAGTTG GAGGGCTACT CCTACTTGGC GAGCACCGGC AGCATGGTCT . 

48401 CCGGCCGCGT CGCCTACACC CTGGGCCTTG AAGGTCCGGC GGTGACGGTC 

48451 GACACCGCGT GCTCCTGCTC CCTGGTCTCC ATCCACCTGG CGACGCAGGC 

48501 ACTCCGGCAC GGCGAGTGCG ACCTCGCCCT GGCAGGCGGT GTGACCGTCA 

48551 TGGCCGACCC GGACATGTTC GCGGGCTTCT CGCGCCAGCG CGGCCTCTCA 

48601 CCTGACGGCC GCTGCAAGGC CTACGCCGCC GCGGCCGACG GAGTCGGATT 

48651 CTCCGAGGGA GTGGGCGTAT TGCTCCTTGA GCGGTTGTCG GATGCGCGGC 

48701 GTCATGGGCG TCGGGTGTTG GGTGTGGTGC GGGGTTCGGC GGTGAATCAG 

48751 GACGGTGCGA GTAATGGGTT GACGGCGCCG -AATGGTCCGT CGCAGGAGCG . 

48801 GGTGATTCGT CAGGCGTTGG CCAGTGGTGG GTTGTCGTCG GTGGATGTTG 

48851 ATGTGGTGGA GGGGCATGGG ACGGGGACCA CGTTGGGTGA. TCCGATCGAG 

48901 GCGCAGGCTC TGCTGGCCAC ATATGGGCAG GGGCGTCCGG * AGGACCGTCC 

48951 GTTGTGGTTG GGGTCGGTGA AGTCGAACAT TGGTCATACG CAGGCGGCTG 

49001 CGGGTGTTGC GGGTGTCATC AAGATGGTGA TGGCGATGCG GCATGGTGTG 

.49051. GTGCCGGCGA .GTTTGCATGT GGATGTGCCG TCGCCGCATG TGGAGTGGGA 

49101 TTCGGGTGCG GTGCGGTTGG CGGTTGAGTC GGTGCCATGG CCGCAGGTGG 

49151 AGGGTCGTCC GCGTCGGGCG GGTGTGTCGT CGTTCGGCGC TTCGGGGACG 

49201 AATGCGCACG TGATCGTGGA GTCTGTTCCC GATGGGCTGG AGGAGGACTC 

4 9251 GGTATCGGTC GGCGGTGAGG CTCTTGA<3AC GGAGACTGAC GGGCGCTTGG 

49301 TGCCGTGGGT -GGTGTGGGCC CGCAGCCCGC AGGCCCTGCG CX3ACCAGGCA 
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49351 CTACGCCTGC GTGACTTTGC CAGTGACGCG TCGTTCCGCG CGCCGCTCGC 

49401 CGACGTGGGC TGGTCGCTGC TGAAGACGCG TGCGCTGCAT GAGCATCGCG 

49451 CCGTTGTGGT GGGCGCGGAG CGGGCAGAGC TGATCGCCGC TCTGGAGGCG 

49501 CTGGCGACGG GTGAGCCGCA TGCGGCGCTG GTCGGCCCGG CTTGCTCGCA 

49551 GGCTCGGGTG GGTGGCGATG ACGTGGTGTG GCTGTTCAGT GGTCAGGGCA 

49601 GTCAGTTGGT CGGTATGGGT GCTGGTTTGT ATGAGCGGTT CCCGGTGTTT 

49651 GGGGCTGCGT TTGATGAGGT GTGCGGCCTG TTGGAGGGGC CGTTGGGCGT 

49701 GGAGGCGGGT GGGTTGCGGG AGGTGGTGTT CCGTGGCCCG CGGGAGCGGT 

49751 TGGATCACAC GGTGTGGGCG CAGGCGGGGT TGTTTGCGCT GCAGGTGGGG 

49801 TTGGCCCGGT TGTGGGAGTC GGTCGGGGTG CGGCCGGATG TGGTGCTCGG 

49851 GCATTCGATC GGTGAGATCG CGGCCGCGCA TGTGGCGGGG GTTTTTGATC 

49901 TGGCGGATGC GTGTCGGGTG GTGGGTGCGC GGGCGCGTTT GATGGGTGGG 

49951 CTGCCTGAGG GTGGGGCGAT GTGCGCGGTG CAGGCCACGC CCGCCGAGCT 

50001 GGCCGCCGAC GTGGACGGAT CGGCTGTAAG TGTGGCGGCA GTCAACACCC 

50051 CCGACTCCAC GGTGATTTCG GGCCCGTCGG ACGAGGTGGA CCGGATTGCT 

'50101 GGGGTGTGGC GGGAGCGTGG GCGCAAGACG AAGGCGCTGA GCGTCAGTCA 

50151 TGCCTTCCAT TCGGCGTTGA TGGAGCCGAT GCTCGCGGAG' TTCACCGAAG 

50201 • CGATACGAGG GGTCAAGTTC AGGCAGCCGT CGATCCCGCT CATGAGCAAT 

50251 GTCTCCGGAG AGCGGGCCGG CGAGGAGATC ACGGATCCGG AGTACTGGGC 

50301 GAGGCATGTA CGTAATGCGG TGCTCTTCCA GCCCGCCATC GCCCAAGTAG 

50351 CGGATTCAGC GGGCGTGTTT GTGGAGCTCG GCCCCGCGCC. TGTGCTGACC 

50401 ACGGCCGCCC AGCACACCCT GGACGAGTCG GACAGCCAGG AGTCGGTGCT ' 

50451 GGTCGCGTCT CTCGCCGGTG AGCGTCCTGA GGAGTCGGCG TTTGTGGAGG 

SOS 01 CGATGGCTCG TCTGCATACC GCTGGTGTTG CTGTGGACTG GTCGGTGTTG 

50551 TTCGCGGGTG ATCGTGTGCC TGGGCTGGTG GAGTTGCCGA C^TATGCGTT 

50601 CCAGCGGGAG OGGTTCTGGT TGAGTGGCGG TTCTGGGGGT GGGGATGCGG 
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50651 CGACTTTGGG GTTGGTGGCG GCGGGGCATC CGTTGTTGGG TGCGGCGGTG 
50701 GAGTTCGCGG ACCGGGGTGG GTGTCTGCTG ACCGGTCGTC TGTCGCGGTC 

50751 TGGGGTGTCG TGGCTTGCTG ATCATGTGGT GGCGGGTGCG GTTTTGGTGC 

50801 CGGGTGCTGC GTTGGTGGAG TGGGCGTTGC GGGCCGGTGA TGAGGTCGGT 

50851 TGTGTGACGG TGGAGGAGTT GATGTTGCAG GCGCCTTTGG TGGTGCCTGA 

50901 GGCGTCGGGT CTGCGGGTTC AGGTGGTGGT TGAGGAGGCG GGTGAGGA'&G 

50951 GGCGGCGCGG TGTTCAGATC TACAGCCGGC CCGACGCGGA CGCCGTGGGC 

51001 GGCGATGACT CGTGGATCTG CCACGCGACC GGCGTACTGT CACCCGAAAG 

51051 CGCTCGTCTG GACACGGAGT TGGGTGGCGT CTGGCCACCG GCCGGTGCCG 

51101 AACCGCTGGA TGTCGACGGC TTCTACGCGC AGGCCGGTGA GGCCGGGTAC 

51151 GGATACGGTC CGGCGTTCCG GGGGCTGCGT GCCGTGTGGC GGCACGGCCA 

51201 GGACCTGCTG GCCGAGGTCG TCCTGCCCGA AGCCGCCGGT GCCCATGACG 

51251 GCTACGGGAT CCACCCCGCC CTCCTCGACG CCACCCTCCA TCCGCTGCTC 

51301 GCCGCCCGCT TCATGGACGG TTCCGAGGAC GATCAGCTCT ACGTACCGTT 

51351 ' CGGGTGGGCC GGAGTGTCTC TGCGGGCGGT GGGAGCCACG ACTGTGGGCG 

514 01 TGCGCCTCCG TCCGGTCGGG GAGAGCX3TCG ACCAAGGGCT GAGCGT6ACG 

51451 GTCACCGATG CGACCGGCGG TCCCGTTCTG AGCGTCGACT CCCTCCAGAC 

51501 CCGCCCCGTG AAGCCGAGCC AATTGGCTGC GGCCCAACAG CCGGACGTAC 

51551 GCGGTCTGTT CACTGTGGAG TGGACGCCGC TGCCGCAGAC GGATGCCGAC 

51601 GGGGAGGCCG ACTGGGTTGT GCTCTCGGAC GGTGTTGGCC GTCTGGCTGA 

51651 TGTGGTGTCG GCGGCGGGTG GTGAAGCGCC GTGGGCAGTG GTCGCTCCTG 

51701 TCGATGCGTC TGTGGGCGAC GGCCGTGAGG GTCTTGACGG TCGGCTGGTC 

51751 GTGGAGCGGG TGCTGTCACT CGTACAGQAG TTCCTGGCCC TGCCGGAGCT 

51801 GGCCGAGTCC CGTCTCCTCG TGGTGACGCG CGGTGCGGTG GCCACCGGCG 

51851 TCGACGGTGA CGGTGACGTG 'GACGCGTCCG CCGCAGCTGT ATGGGGCCTG 

51901 GTCCGCAGTG CTCAGTCCGA GAATCCGGGC CGCTTCATCC TGCTCGACGT 
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51951 GGACGGCGAC GGCGACGACC AGGGCCCGGA CCTGAACGGC CGGCATCTGC 

52001 CCCACGCCAC CCTGCGTCAC GCCGCCGAGG AACTCGACGA GCCCCAACTC 

52051 GCGCTGCGGG AAGGGACGCT CTACGTCCCC CGACTGACCC AGGCGCGCCA 

52101 GTCCGCCGAA CTCGTCGTGC CGCCCGGTGA ACCGGCGTGG CGCCTGCGGA 

52151 TGGTGCACGA CGGCTCGCTG GACGCCCTGG CGGCAGTGGC CTGCCCGGAG 

52201 GCCCTGGAGC CCTTGGCGCC GGGGCAGGTG CGTATCGCCG TACACGCd&C 

52251 GGGCATCAAC TTCCGTGACG TACTGGTGGC CTTGGGTATG GTGCCCGCGT 
52301;. ACGGGGCCAT GGGTGGCGAA gStGCCGGTG TCGTGACGGA GGTCGGTCCC 

.52351 GAGGTCACCG ATGTCTCGGT GGGCGACCGC GTGATGGGCG TGTTCGAGGG 

52401 CGCGTTCGGC CCTGTGGTGA TCGCCGAGGC GCGGATGGTC acacctgtcc 

52451 CGCAGGGCTG GGACAT6CGG GAGGCGGCCG GTATTCCGGC GGCCTTCCTG 

52501 ACGGCTTGGT ACGGGTTGGT GGAGCTGGCC' GGTCTGAAGG CGGGCGAGCG 

52551 GGTGCTGGTC CATGCCGCGA- CGGGTGGTGT GGGGATGGCG GCGGTGCAGA 

52601 TCGCCCGGCA TGTGGGTGCC GAGGTGTTCG CCACCGCGAG TCCGGGCAAG 

52651 . CACGCCGTGC • TGGAGGAGAT GGGCATCGAC GCCGCCCACC GCGCCTCCTC 

52701 CCGGGACCTC GCCrTCGAGG GCAGGTTCAG GGAAGCAACG GGCGGCCGCG 

52751 GCATGGACGT CGTGCTCAAC AGCCTTGCCG GCGAGTTCAT CGACGCCTCT 

52 801 CTGCGGTTGC TCGGCGACGG CGGCCGGTTC CTGGAGATGG GCAAGACCGA 

52851 TGTGCGGGCC GCCGAAGAGG TGGCTGCGGA GCACGCGGAC GTCTCGTACA 

52901- CG6CGTACGA CCTCGTGGGT GATGCCGGAC CCGACCGCAT CAGCAACATG 

52951 CTGCACAAGC TCGTCGAATT GTTCGCCTCA GAACGGCTTA AGCCGCTGCC 

53001 <3GTACGTTCC TGGCCGCTGG ACAAGGCGCA GGAGGCGTTC CGGTTCATGA 

530$1 .GTCAGGCGAA GCACACCGGC AAGCTGGTGC TTGAGATCCC GCCTGCCCTC 

53101 GACCCCGAGG <3CACCGTTCT GGTCACGGGG GGCACCGGTG CGCTGGGGCA - 

53151 GGTCOTGGCC GAGCATCTGG TCCGGGAGTG GGGCGTACGG CACCTGCTGC 

53201 TGGCCAGCCG TCGCGGTCCG GAGGCCCCGG -GCAGCGACGA ACTGGGCTCG ' 
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53251 AAGCTCACCG GGTTGGGTGC CGAGGTCACC ATTGTCGCGG CCGATGTCAG 

53301 CGACCCGGCC TCGGTGGTGG AGCTGGTCGG CAAGACGGAT CCCTCGCATC 

53351 CGTTGACGGG TGTCGTGCAC GCGGCGGGCG TGTTGGAGGA CGGTGTCGTG 

53401 ACCGCTCAGA CGCCTGAGGG GCTGGCGCGG GTGTGGGCGG CCAAGGCTGC 

•53451 TGCGGCGGCG AATCTCCATG AGGCGACCCG GGAGATGCGT CTCGGCCTGT 

53501 TCGTGGTGTT CTCCTCGGCG GCCGCCACGC TCGGCAGTCC GGGCCAGCfCC 

53551 AACTACGCGG CCGCCAATGC CTATTGCGAC GCGCTGATGC AGCACCGACG 

53601 GGCGGTGGGC CAGGTCGGCC TCTCGGTCGG CT6GGGTCTC TGGGAGGCGC 

53651 CGGACGCCAA GCCGGGTGTT GCCGCCGACG CCAAGGCGAG TGCTGCCACC 

53701 GTCGGCAAGG CGAGTGCTCT ATCCGACGGC AGGAACGGCA GCGCTCCCCA 

53751 GGACACGACC GGCACCGCCC CCCAGGGCAT GACCGGCGGA CTCACCGACA 

53801 CCGACGTAGC CCGCATGGCA CGTATCGGCG TCAAGGGCAT GAGCAACGCC 

53851 CACGGTCTCG CCCTGTTCGA CGCCGCGCAC CGCCACGGCC GCCCCCACCT 

53901 GGTCGGCTTC AACCTCGACC TGCGCACCCT -GGCCACGCAC CCCCTGCACA 

53951 CCCGGCCCGC CCTTCTGCGC GGCCTGGCCA CCCCCACCGC CGGCXX3GGCG ^ 
54001 . AGCAGGCCGA CCGCGACCGC GGGCGGACAG . CCCGCCGACC TGGCGGGCCG 

54051. GCTGGCCGCG CTGTCGCCGT CGGACCGGCA CCACACGCTG GTCCGGCTCA 

54101 TCAGGGAACA GGGCX3CCACC GTGCTCGGGC ACCACCCGGA CAGTCTCACC 

54151 ACGGGCAGCA CCTTCAAGGA ACTCGGATTC GACTCCCTGA CCGCGGTCGA 

54201 ACTGCGCAAC AGGCTGTCCG CCGCCACCGG TCTCCGGCTC CCCGCCGGCC 

54251 TGGTCTTCGA CCACCCGGAC GCCGACATCC TGGCCGAACA CCTCGGCGCG 

54301 CAACTCGCCC CCGACGGGGA CACCCCCGCC GGTGCGGAAG CCACCGACCC 

54351 GGTCCTCCGC GACCTGGCGA AACTCGAGAA CGCCCTCTCC TCCACCCTCG 

54401 TCGAGCACCT CGACGCGGAC GCGGTCACGG CCCGACTGGA AGCACTCCTG 

54451 TCGAACTGGA AGGCGGGGAG QGCGGCGCGC CGCTCGGGCA GCAGGAAGGA 

54501 GCAGCTCCAO GTTGCCAGGA CGGACCAGGT CCTCGACTTC ATCGACAAAG 
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54551 AACTGGGTGT GTGAAACGAC CGTGCACGGC GCGACAACCA CGCTGAAGGC 

54601 TGGGTGAACT CTCATGGCGA GTGAAGAGGA ACTGGTCGAC TACCTCAAGC 

54651 GGGTCGCCGC CGAACTGCAC GACACCCGGC AGCGCCTGCG CGAGGTCGAG 

54701 GACCGGCGGC AGGAGCCGGT GGCCGTCGTC GGCATGGCCT GCCGTTTCCC 

54751 CGGCGGCATC GAGACGCCCG AGGGACTGTG GGAGCTGGTC GCGGCCGGCG 

54801 ACGACGCCAT TGAGCCCTTC CCCACCGACC GGGGCTGGGA CCTGGAaSsC ' 

54 851 ATCTACCACC CGGACGCCGA CCACCCGGGT ACCTGCTACG TGCGGGAGGG 

54901 CGGGTTCCTA GCCGCCCCTG aHcGGTTCGA CTCCGACTTC TTCGGCTTCA 

54 951 GCCCGCGCQA GGCCCTGGCC AGCAGCCCGC AACTGCGACT GCTCCTGGAG 

55001 ACGTCCTGGG AGGCCCTCGA ACGGGCGGGC ATCAACCCCG CCTCGCTCAA 

55051 GGGCAGCCCC ACCGGCGTCT ACGTCGGCGC cgcgaccacc ggcaaccaga 

55101 cgcagggcga ccccggcggc aaggcgaccg agggttacgc gggcaccgcg 

55151 cccagcgtcc tctcgggccg gctctcgttc acgctcggcc tggagggccc 

55201 ggcggtgacc .gtcgagacag cgtgctcctc ctcgctggtg gcgatgcacc 

55251 TGGCGGCCAA CGGCCTGCGC CAGGGCGAGT GCGACCTCGC CCTCGCGGGC 

55301 GGCGTCACCG TCATGTCCAC CCCCGA<3GTG TTCACAGGCT tctcgcgtca 

55351 gcggggactg gcccccgacg gccgctgcaa gccgttcgcc gccgcggccg 

55401 ACGGCACGGG CTGGGGCGAG GGCGCGGGCC TGATCCTCCT GGAGCGCCTC 

55451 TCCGACGCCC GCAGGAAGGG CCACAAGGTG CTCGCGGTGA TCCGGGGCTC 

55501 GGCGATCAAC CAGGACGGCG CGAGCAACGG CTTCACCGCG CCCAACGGCC 

55551 . CCTCGCAGCG CCGCGTCATC CGCCAGGCAC TCTCCAGCGC CCACCTCTCC 

55601 ACGTCGGAGA TCGACGTCGT CGAGGCGCAC G6CACCGGCA CCAGGCTCGG 

55651 CGACCCCATC GAGGCCGAGG CGCTCATCGC . CACCTACGGC AAGGAGCGCG 

55701 AGGACGACCG -TCCCCTGTGG CTCGGCTGGG TCAAGTCCAA CATCGGCCAC 

55751 ACGCAGGCCG CCGCGGGCGl* CGCGGGAGTC ATCAAGATGG TGATGGCGCT 

55801 ACAGCGCGAA CTGCTTCCCG CCACCCTGAA CGTCGACGAG CCGACCCCGC 
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55851 ACGTCCAGTG GGAGGGCGGC GGCGTACGCC TCCTGACCGA ACCGGTCCCG 

55901 TGGTCGCGCG GCGAACGCCC GCGCCGCGCC GGAATCTCCT CCTTCGGCAT 

55951 ATCGGGCACG AACGCGCACG TGGTCCTGGA GGAGGCGCCG CCGGAGGAGG 

56001 ACGTGCCGGG CCCCGTGGCT GCGGAGCCGG AAGGGGTGGT GCCGTGGGTG 

56051 GTCTCCGCGC GGACCGAGGA GGCGTTGAGC GAACAGGCGC GGCGCCTGGG 

56101 CGAGTTCGTG GCCGACACGG ACCCGTCGAC CGCTGACGTC GGGTGGTC^C 

56151 TGACCACGAG CAGGGCGATC CTTGAACACC GCGCTGTGGT GGTGGG6CGT 

56201 GATCGGGATG CGCTGACGGC CGGCCTGGCG GCGTTGGCCG CGGGTGAGGA 

56251 GTCGGCGGAT GTGGTGGCTG GGGTGGCCGG TGATGTGGGT CCTGGGCCGG . 

56301 TGTTGGTGTT TCCGGGGCAG GGGTCGCAGT GGGTGGGCAT GGGCGCCCAG 

56351 CTCCTTGACG AGTCGCCCGT- CTTCGCGGCG CGGATCGCGG AGTGTGAGCA 

56401 GGCGCTGTCG GCGTACGTGG ACTGGTCGCT GAGTGCGGTG TTGCGCGGGG 

56451 ATGGGAGTGA ACTGTCCCGG GTCGAGGTCG TGCAGCCGGT GTTGTGGGCG 

56501 GTGATGGTCT CGCTGGCTGC CGTCTGGGCG -GATTACGGGG TCACCCCGGC . 

56551 CGCTGTGATC GGGCACTCGC "AiSGGCGAGAT GGCCGCCGCG TGCGTGGCGG 

56601 GGGCGCTGTC TTTGGAGGAT GCGGCGCGCG TCGTGGCCGT ACGCAGTGAC 

56651 GCGCTTCGTC AGCTGATGGG GCAGGGCGAC ATGGCGTCGT TGGGCGCCAG 

56701 CTCGGAGCAG GCGGCTGAGC TCATCGGTGA TCGGCCGGGC GTATGCATCG 

56751 CAGCGGTCAA CGGGCCGTCC TCGACAGTCA TTTCAGGACC GCCGGAGCAT 

56801 GTGGCAGCCG TGGTCGCGGA TGCGGAGGAA CGTGGTCTGC GCGCCCGTGT 

56851 CATCGATGTC GGCTATGCCT CGCACGGTCC CCAGATCGAT CAGCTCCACG . 

56901 ACCTCCTCAC CGACCGGCTC GCCGACATCC GGCCCGCGAC CACGGACGTG 

56951 GCCTTCTATT CGACGGTCAC CGCCGAGCGC CTGAGGGACA CCACGGCCCT 

57001 GGATACGGAT TACTGGGTTA CCAACCTCCG CCAGCCGGTC -CGTrTCGCCG 

57051 ACACCATCGA TGCGCTTCTC GCGGACGGCT ATCGCCTGTT CATCGAGGGC 

571^01 AGCGCGCACC CGGTGCTGGG TCTGGGCATG GAGGAGACCA TCGAGCAGGC 
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57151 GGACATCCCC GCCACGGTCG TCCCCACCCT GCGCCGCGAT CACGGTGACA 

572 01 CCACCCAGCT CACCCGTGCC GCAGCGCACG CCTTCACCGC CGGCGCCACC 
57251 GTCGACTGGC GGCGCTGGTT CCCGGCCGAC CCCACCCCCC GCACGATCGA 

573 01 CCTGCCCACC TACGCCTTCC AGCGCCGCAG CTACTGGTTG CCGGTGGACG 
57351 GTGTCGGAGA TGTGCGGTCG GCCGGGCTGC GGCGGGTGGA ACACTCGCTG 
57401 TTGCCCGCGG CGCTCGGTCT CGCCGATGGT GCGCTCGTGC TGACCGGAtG 
57451 GCTCGCGGCG TCCGGTGGTG GTGGCGGTTG GCTCGCGGAT CACGCGGTGG 
57501 CGGGCACGAC GCTCGTCCCC GGTGCCGCGC TGGTCGAGTG GGCGTTGCGG 
57551 GCCGCCGACG AGGCGGGCTG CCCCTCCCTT GAGGAGCTGA CGCTCCAGGC 
57601 ACCTCTGGTG CTGCCCGGCT CCGGGGGCCT CCAGGTCCAA GTGGTCGTGG 
57651 GTCCGGCCGA CGGACAGGGC GGCCGGCGTG AGGTGCGCGT CTTCTCGCGT 
57701 GTCGACTCGG ACGACGAGGC AGCGGGGCAG GACGAGGGGT GGTCGTGTCA 
57751 CGCGACCGGT GTGCTGAGCC CCGAGCCCGG TGCGGTACCG GACGGGCTCA 
57801 GCGGACAGTG GCCGCCGACG GGCGCCGAGC CGCTGGAGAT CAGTGATCTC 
57851 TACGAGCAGG CGGCATCGiSC -GGGATACGAG TACGGGCCGT CGTTCCGGGG 
57901 CCTGCGCTCC GTGTGGCGGC ACGGGCATAA CCTGCTGGCA GAGGTGGAGC 
57951 TGCCCGAACA GGCAGGTGCG CACGACGACT TCGGCATCCA CCCCGTACTG 
58001 CTGGACGCCG CGCTGCACCC GGCGCTGCTG CTCGACCAGA ACGCGCCCGG 
58051 CGAAGAGCAA GAGCCAGCCC AGCCCGCTCT TCGCCTGCCG TTCGTGTGGA 
58101 ACGGCGTCTC CCTGTGGGCC ACCGGCGCCG CGACCGTGGG GGTACGGCTG 
58151 GCCCCGCACG GGGGAGGGGA GACGGACGAT AGCGCCGGGC TGCGCGTGAC 
58201 GGTCGCCGAC GCCACCGGAG CACCGGTGCT GAGCGTGGAC TCCCTCGCTC 
5 8251 TGCGCCCCGC TGACCCCGAA CTGCTGCGCA CGGCCGGTCG GGCGGGCAGC 
583 01 GGGACCAACG GCTTGTTGAC OGTGGA^STGG ACCGCTCTGC CCCCGGCGGA 
58351 -CGTGGCGGAC CACGCGGCAG- GCGACGGCTG GGGGGTGCTC GGTCAGGACG . 
58401 TACCCGACTG GGCCGGAGCG GACATGCCCC GGCATCCCGA CATGGCCTCC 
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58451 CTGTCGGCCG CGCTGGACGA GGGAACGCAG GCCCCTGCGG CCGTCTTCGT 

58501 GGAGACCACA GCCACATCGC ACGCCACACC GAACACCGCA GCGGACGTGA 

58551 CGCTCGACGC GTCCGGCCGG GCGGTCGCCG AGCGCACCCT GCACCTGCTG 

56601 CGGGACTGGC TCGCCGAACC GCGCCTCGCC GAGACCCGGC TCGTCCTCAT 

58651 CACCCACCAC GCGGTGACGA CCCCGGCGGA CGACGACGTG AACGCCGCAC 

58701 CCCTCGACGT CCCGGCCGCC GCCCTGTGGG GACTGATCCG CAGCGCaS^G 

58751 GCCGAACACC CGGACCGCTT CGTTCTGTTG GACACCGACG CGAAGGCCAA 

58801 CACCGACCCC ggccccgaca ccagtactga ccacagcacc gcatcgggta 
58851 cgtaccgaac cgtcatcgcg cgggccctcg ccaccgggga gccacagctg 

58901 GCCGTGGGCG CGGGAGAACT GCTGGCTCCC CGCCTCGCCC GAGCCGCCAC 

58951 CCCCACACCC GAGACCCCCA CACCCGAGAC ACAGCCCGAC ACCGGATCCG 

59001 GGTCCGAGGC CGGGGCCGGG TCCGGATCTG GACCCGGCGC GACACTGGAC 

59051 CCCGACGGCA CCGTCCTCAT CGCGGGCGGC ACCGGCATGA TGGGTGGTCT 

59101 CGTCGCCGAA CACCTGGTCC GCGCCTGGTC GGTGCGGCAC CTCCTGCTCG 

59151 TCAGCCGGCA AGGGCCCGAC 6CGCCGGACG CCCGCGACCT CGCCGACCGG 

59201 CTGGTCGGCC TGGGCGCGAC GGTACGGATC GTCGCGGCCG ACCTGACGGA 

59251 CGGGCGGGCC ACCGCGGACC TCGTCGCGTC GGTCGACCCG GCGCACCCGC 

59301 TCACCGGTGT GATCCACGCG GCCGGCGTCC TGGACGACGC CGTGGTCACC 

59351 GCGCAGACCT CCGACCAGCT GGCCAGGGTG TGGGCGGCCA AGGCGTCCGT 

59401 CGCCGCCAAC CTGGACGCGG CCACGTCGGA GCTGCCGCTC GGCTTGTTCC 

59451 TGATGTTCTC GTCCGCCGCC GGTGTCCTCG GCAACGCGGG CCAGGCCGGT 

59501 TACGCGGCCG CCAACGCCTT CGTCGACGCC CTGGTCGGCC GCCGTCGCGC 

59551 CACCGGCCTG CCCGGCCTGT CGATCGCCTG GGGCCTGTGG GCGC<3CGGCA 

59601 GCGCCATGAC CCGGCACCTG GACGAGGCCG ACCTCGCGCG GCTGCGTGCC 

59651 <3GCGGGGTCA AGCCCCTGCT GGACGAGCAG GGCCTCGCCC TCCTCGACGC 

59701 GGCGCGCGCC ACC<5CCGC!GC ACACCTCGCT iSGTGGTCGCG GGCGGTATCG 
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59751 ACGTACGCGG ACTGAACAGG GACGACGTCC CCGCGATCCT CCGCGACCTG 

59801 GCGGGCCGGA CCCGCCGCAG GGCGGCCGCC GACTCCACCG TCGACCAGGC 

59851 CGCGCTGGAG CGGCGCCTCA CGGGCCTGGA CGAGGCCGAG CGCCGGGCTG 

59901 TCGTCACCGA CGTCGTACGC GAATGCGTGG CGGCCGTGCT CGGCCACCGG 

59951 TCGGCGGCCG ACGTACGCAC CGAGGCCAAC TTCAAGGACC TCGGCTTCGA 

60001 CTCGCTCACT GCGGTGCAGC TGCGCAACCG CCTCTCGGCG GCGAGCGGCC 

60051 TCCGCCTGCC CGCCACCCTG. GCCTTCGACC ACCCCACCCC CCAGGCGCTG 

60101 GCGGCGTACC TGGGCACGCG cfCTGAGCGGC CGGACCGCCA CCCCCGTCGC 

60151 ACCCGTGGCG CCTTCCGCGG CCGCGACGGA CGAGCCGGTG GCGATCGTCG 

60201 CGATGGCCTG CAAGTACCCG GGTGGAGCGA CCTCGCCGGA AGGCCTCTGG 

60251 GACCTGGTCG CGGAGGGCGT GGACGCGGTC GGCGCCTTCC CGACGGGCCG 

60301 CGGCTGGGAC CTCGAACGGC TCTTCCACCC CGACCCGGAC CACCCCGGCA 

60351 CGAGTTACGC CGACGAAGGG GCCTTCCTTC CTGACGCGGG CGATTTCGAT 

60401 GCGGCGTTCT TCGGGATCAA TCCGCGGGAG GCGCTGGCGA TGGATCCGCA 

60451 GCAGCGGCTG TTGCTGGAGG CGTCGTGGGA GGTGTTGGAG CGTGCGGGTA 

60501 TCGACCCGAC GACGCTCAAG GGCACCCCGA CCGGCACGTA CGTCGGCGTG 

60551 ATGTACCACG ACTACGCGGC AGGCCTCGCC CAGGACGCCC AACTGGAGGG 

60601 CTACTCCATG CTCGCCGGCT CCGGCAGCGT GGTGTCCGGC CGCGTCGCCT 

60651 ACACCCTGGG GCTTGAGGGT CCTGCGGTGA CGGTCGACAC CGCGTGCTCC 

60701 TCGTCCCTGG TCTCCATCCA CCTGGCCGCG CAAGCACTGC GACAGGGCGA 

60751 GTGCACTCTC GCCCTCGCGG GCGGCGTGAC CGTCATGGCC ACGCCCGAGG 

60801 TGTTCACCGG ATTCTCGCGC CAGCGCGGCC TGGCCCCCGA CGGCCGCTGC 

60851 AAGCCGTTCG CCGCCGCCGC CGACGGCACC GGCTGGGGCG AGGGTGTCGG 

60901 TGTGTTGTTG CTCGAGCGGT TGTGGGATGC GCGGCGTCAT GGGCGTCGGG 

60951 TGTTGGGTGT GGTGCGGGGT TCGGCGGTGA ATCAGGACGG TGCGAGTAAT 

61001 OGGTTGAOGG OGC-CGAATGG TCCGTGGCAG GAGGGGGTGA TTCGTCAGGC ' 
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61051 GTTGGCCAGT GGTGGGTTGT CGTCGGTGGA TGTTGATGTG GTGGAGGGGC 

61101 ATGGGACGGG GACCACGTTG GGTGATCCGA TCGAGGCGCA GGCTCTGCTG 

61151 GCCACGTATG GGCAGGGGCG TCCGGTGGAT CGTCCGTTGT GGTTGGGGTC 

61201 GGTGAAGTCG AATATTGGTC ATACGCAGGC GGCTGCGGGT GTTGCGGGTG 

61251 TCATCAAGAT GGTGATGGCG ATGCGGCATG GTGTGGTGCC GGCGAGTTTG 

61301 CATGTGGATG TGCCGTCGCC GCATGTGGAG TGGGATTCGG GTGCGGTGVG 

61351 GTTGGCGGTT GAGTCGGTGC CATGGCCGGA GGTGGAGGGT CGTCCGCGTC 

6 14 01 GGGCGGGTGT GTCGTCGTTC GGGGCTTCGG GAACGAATGC GCACGTGATC 

61451 GTGGAGTCTG TGCCCGATGG GCTGGGGGAG GACTCGGTAT CGGTCAGTGG 

61501 TGAGGCTCCC GAGACTGAGA CTGACGGGCG CTTGGTGCCG TGGGTGGTAT 

61551 CGGCCCGCAG CCCGCAGGCC CTGCGCGACC AGGCACTACG CCTGCGTGAT 

61601 GCGGTGGCGG CCGACTCAAC GGTGTCGGTG CAGGATGTGG GCTGGTCGCT 

61651 GCTGAAGACG CGTGCGCTGT TCGAGCAGCG GGCGGTGGTG GTGGGGCGTG ■ . 

61701 AGAGGGCTGA ACTCCTGTCG GGGCTTGCTG TGTTGGCCGC TGGCGAGGAG 

61751 CACCCGGCTG TGACGCGGTC CCGTGAGGAC GGGGTTGCTG CGAGCGGTGC . 

61801 TGTGGTGTGG CTGTTCAGTG GTCAGGGCAG TCAGTTGGTC GGTATGGGTG . 

61851 CTGGTTTGTA TGAGCG6TTC CCGGTGTTTG CGGCTGCGTT TGATGAGGTG 

61901 TGCGGCCTGT TGGAGGGGCC GTTGGGCGTG GAGGCGGGTG GGTTGCGGGA 

61951 GGTGGTGTTC CGTGGCCCGA GGGAGCGGTT GGATCACACG ATGTGGGCGC 
62001 . AGGCGGGGrr GTTTGCGCTG CAGGTGGGGT TGGCCCGGTT GTGGGAGTCG 

62051 GTCGGGGTGC GGCCOGATGT GGTGCTCGGG CATTCGATCG GTGAGATCGC 

62101 GGCCGCGCAT GTGGCGGGGG - TCTTTGATCT GGCGGATGCC TGTCGGGTGG 

62151 TGGGGGCGCG GGCCCGTTTG ATGGGTGGGC TGCCTGAGGG CGGGGCGATG 

62201 TGCGCGGTGC AGGCCACGCC CGCCGAGCTG GCCGCCGACG TGGACGACTC 

62251 TGGTGTGAGT GTGGCGGCGG TCAACAGACC TGATTCGACG GTGATTTCAG 

62301 GGCCGTCTGG TGAGGTGGAT CGGATTGCTG GGGTGTGGCG GGAGCGTGGG 
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62351 CGTAAGACGA AGGCGCTGAG CGTCAGTCAT GCCTTCCACT CGGCGTTGAT 
62401 GGAGCCGATG CTCGCGGAGT TCACCGAAGC GATACGAGAG GTCAAGTTCA 
62451 CGCGGCCGAA GGTGTCGTTG ATCAGCAACG- TCTCTGGTCT GGAGGCGGGT 
62501 GAGGAGATCG CGTCCCCGGA GTACTGGGCA CGCCATGTAC GCCAGACAGT 
62551 GCTCTTCCAG CCCGGCATCG CCCAAGTGGC TTCCACGGCA GGCGTGTTTG 
62601 TCGAGCTCGG CCCCGGCCCC GTACTGACTA CTGCCGCCCA GCACACCCTG 
• 62651 GACGACGTAA CCGATAGGCA TGGCCCCGAA CCGGTACTGG TGTCCTCGCT 
62701 GGCCGGTGAG CGTCCTGAGG AGTCGGCGTT CGTGGAGGCG ATGGCTCGTC 
62751 TGCATACCGC TGGTGTTGCT GTGGACTGGT CGGTGTTGTT CGCGGGTGAT. 
62 801 CGTGTGCCTG GGCTGGTGGA GTTGCCGACG TATGCGTTCC AGCGGGAGCG 
62851 GTTCTGGTTG AGCGGCCGTT CTGGGGGTGG GGATGCGGCG ACTTTGGGTC 
62901 TGGTGGCGGC GGGGCATCCG TTGTTGGGTG CGGCG6TGGA GTTCGCGGAC 
62951 CGGGGTGGGT GTCTGCTGAC CGGTCGGCTG TCGCGGTCTG .GGGTGTCGTG 
■63001 GCTTGCTGAT CATGTGGTGG CGGGTGCGGT TTTGGTGCCG GGTGCTGCGT 
63051 TGGTGGAGTG GGCGTTGCGG GCCGGTGATG AGGTCGGTTG TGTGACGGTG 
63101 GAGGAGTTGA TGTTGCAGGC GCCTTTGGTG GTGCCTGAGG . CGTCGGGTCT 
63151 GCGGGTTCAG GTGGTGGTCG AGGAGGCGGG TGAGGACGGG CGGCGCGGTG 
63201 TCCAGATCTA TAGCCGGCCT GACGCGGACG CCGTGAGCGG -CGACGACTCG • 
63251 TGGATCTGCC ACGCGACCGG CACCCTCACC CCCCAGCACA CCGACGCTCC 
63301 GAACGACGGA CTGGCCGGCG CGTGGCCCGC GGCGGGCGCC GTGCCGGTGG 
63351 ACCTGGCGGG CTTCTACGAG CGCGTGGCGG ACGCGGGCTA TGCGTACGGC 
63401 CCGGGGTTCC AGGGGCTGCG. TGCCGTGTGG CGGCACGGTC AGGACCTGCT 
63451 GGCCGAGGTC GTCCTGCCCG AAGCCGCGGG TGCCCATGAC GGCTACGGCA 
63501 TCCACCCCGC CCTCCTCGAC GCCACCCTCC ACCCGGCCCT GCTCCTCGAC 
63551 TGGCCCGGGG AGGTGCAGGA CGACGACGGG AAGGTGTGGC TGCCTTTCAC 
63601 CTGGAACCAG GTCTCCTTGC GGGCTGCGGG. AGCCGCCACC GTACGCGTAC 
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63651 GTCTCTCGCC CGGCGAGCAC GACGAGGCGG AACGGGAAGT ACAGGTACTG 

63 701 GTGGCCGACG CCACCGGGAC CGACGTCCTG AGCGTGGGGT CGGTGACGTT 

63751 GCGTCCCGCC GACATCCGGC AACTGCAGGC CGTGCCGGGT CACGACGACG 

63 801 GTCTGTTCTC GGTGGACTGG ACGCCGCTGC CGCTGTCGCG GACGGATGTG 

63851 TCGCAGACGG ATGCCGACGG ,GGATGCCGAC TGGGTTGTGC TCTCGGACGG 

63901 TGTCGGCAGC CTGGCTGATG TGGTGTCGGC GGCGGGTGGT GAAGCGCCSST 

63 951 GGGCAGTGGT CGCTCCCGTC . GGTGCATCCG CGGGCGGCGG CCTTGCCGGC 
64001 TTTGACCGCC GTGAGGGTCT TGACGGTCGG CTGGTCGTGG AGCGGGTGTT 

64 051 GTCAGTCGTA CAGGAGTTCC TGGCCGCGCC GGAGCTGGCC GAGTCCCGGC 
64101 TCCTCGTGCT GACCCGCGGC GCCGTGGCGA CCGGCGGCGA CGGCGACGGT 
64151 GATGTGGACG CGTCCGCCGC AGCC6TATGG GGCCTGGTCC GCAGTGCTCA 
64201 GTCCGAGAAC CCGGGCCGCT TCATCCTGCT CGACGTGGAC ATGGACGTGG 
64251 ACGTCGACGT GGACATGGAC GTGGACGTCG ACGTGGACGT CGACGTGGAC 
643 01 GTGGACGGAG ACGGCAATGG CAGCGACCTG GACCC-GGACC TGAACGGCCG 
64351 ACGACTTCCC CACGCCACCC TGCGTCACGC CGCGGAGGAA CTCGACGAGC 
64401 CCCAA.CTCGC CCTGCGCGAC GGACAACTGC TCGTTCCGCG GCTGGTCCGC 
64451 6CCACCGGCG GCGGACTCGT CGTGGCGCCC ACCGACCGTG CCTGGCGCCT 
64501 GGACAAGGGA AGCGCCGAGA CGCTGGAGAG CGTCGCGCCG GTCGCGTACC 
64551 CCGGAGTCAT GGAACCCCTG GGCCCCGGCC AGGTCCGCCT CGGCATCCAC 
646.01 GCCGCGGGCA TCAACTTCCG CGACGTCCTG GTCAGCCTCG GCATGGTGCC. 
64651 CGGCCAGGTC GGCCTGGGCG GCGAAGGCGC CGGTGTCGTG ACGGAGACAG 
64701 GCCCCGATGT CACCCACCTG TCGGTCGGCG ACCGCGTGAT GGGCGTCCTC 
64751 CACGGCTCCT TCGGCCCGAC GGCCGTGGCG GACACCCGCA TGGTCGCGCC 
64801 <3GTTCCGCAG GGCTGGGACA TGCGGCAGGC GGCCGCGATG CCCGTCGCGT 
64851 ATCTGACGGC TTGGTACGGG TTGGTGGAGC TGGCCGGTCT GAAGGCGGGC 
64901 GAGCGC<3TGC TGATCCACGC AGCCACGGGT GGTGTGGGAA TGGCGGCGGT 
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64 951 GCAGJlTCGCC cgtcacctgg gtgccgaggt gttcgccacc gccagtgcag 

65001 CCAAGCACGT CGTACTGGAA GAGATGGGCA TCGACGCCGC CCACCGCGCC 

65051 TCCTCCCGGG ACCTCGCCTT CGAGGACACC TTCCGGCAGG CCACCGACGG 

65101 GCGCGGCATG GACGTCGTCC TCAACAGCCT GACCGGCGAG TTCATCGACG 

65151 CATCTCTGCG GTTGCTCGGC GACGGCGGCC GGTTCCTGGA GATGGGCAAG 

652 01 ACCGATGTGC GCACGCCGGA GGAGGTGGCC GCGGAGTACC CGGGTGTdkc 
65251 CTACACCGTG TACGACCTCG TCACCGACGC GGGGCCGGAT CGCATCGCGG 

653 01 TCATGATGAG TGAGCTGGGC GAGAGGTTCG CTTCCGGTGC CCTTGACCCT 
65351 CTGCCGGTGC GTTCCTGGCC GCTGGACAAG GCGCGTGAGG CGTTCCGGTT * 
65401 CATGAGTCAG GCCAAGCACA CCGGCAAACT CGTACTCGAC GTGCCCGCAC 
65451 CGCTCGACCC CGACGGGACC GTCCTGATCA CCGGAGGCAC GGGGGCGCTG 
65501 GGGCAGGTCG TGGCCGAGCA TCTGGTGCGG GAGTGGGGCG TACGGCACCT 
65551 GGTGCTGGCC AGCCGCCGTG GACTGGACGC CCCCGGCAGC GGTGAACTCG 
65601 CCGACAGGCT GTCGGACTTG GGCGCCGAGG TGACCGTCGC GGCGGCCGAT 
65651 GTGAGCGACC CGGCCTCGGT GGTGGAGCTG GTCGGCAAGA CGGATCCCTC 
65701 GCATCCGTTG ACGGGTGTCG TGCACGCGGC GGGCGTGCTT GAGGACGGGA 
65751 TCGTGACGGC TCAGACGCCT GAGGGGCTGG CGCGGGTGTG <3GCGGCCAAG 
65801 GCCGCTGCGG CGGCGAATCT CCATGAGGCG ACCCGGGAGA TGCGTCTCGG 
65851 TCTGTTCGTG GTGTTCTCCT CGGCGGCCGC CACGCTCGGC AGTCCGGGCC 
65901 AGQCCAACTA. CGCGGCTGCC AATGCCTATT GTGACGCGCT GATGCAGCGC . 
65951 CGACGGGCGG CGGGCCAGGT CGGCCTGTCG GTCGGCTGGG GTCTCTGGGA 
66001 GGCACCGGAC GCCAAGCCGG GTGTTGCCGC CGACGCCAAA CCGGATGTTG 
66051 CCGCCGACGC CAAGACGGGA GTTGCCGCCG ACGGCACTCC CCAGGGCATG 
66101 ACGGGCACCC TGAGCGGCAC GGAGGTGGCC CGCATGGCAC GCATCGGCGT 
66151 CAAGGCGATG ACCAGCGCAC ACGGTCTCGC CCTGCTCGAC GCCGCACACC 
66201 GOCACGGCCG CCCCCACCTC GTCGCCGTCG ACCTCGACAG CCGCGTCCTG 
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66251 GCGCACAAAC CCGCCCCGGC CCTCCCCGCC GTCCTGCGCG CCTTCGCCGG 
663 01 AGACCAGGGA GGCCAGGGAG GCGGCCGAGG CGGCGGTCGG GGCGGCGGCC 
66351 CGGCACGACC GGCGGCGGCC ACCACCCGGC AGAACGTCGA CTGGGCCGCG 
66401 AAGCTCTCCG TCCTGACAGC CGAGGAACAG CACCGCACCC TCCTCGACCT 
66451 GGTACGGACG CACGCOGCAG CCGTCCTCGG GCACGCGGGC ACCGACGCCG 

66501 TACGCGCCGA CGCCGCCTTC CAGGATCTCG GCTTCGACTC CCTCACCG^G 

66551 GTCGAACTGC GCAACCGCCT CTCCGCCTCC ACCGGCCTGC GCCTGCCCGC 
66601 CACGTTCATC TTCCGGCACC C6ACCCCGTC GGCCATCGCC GACGAACTGC 

66651 GCGCACAGCT GGCCCCCGCG GGGGCCGACC CGGCCGCGCC GCTCTTCGGT 

66701 GAACTGGACA AGCTGGAGAC C3GTGATCACG GGGCACGCGC ACGACGAGAG 

66751 CACCCGGACC CGCCTGGCGG CACGCCTGCA GAACCTGCTG TGGCGCCTGG 

66801 ACGACACTTC GGCCCGCTCG GACCACGCGG CCGGCGCGAG CGACGCCGAC 

66851 GGCGACGCCG TCGAGAACCG AGACCTCGAG TCCGCGTCGG ACGACGAGGT 

66901 CTTCGAGCTG ATCGACCGAG AACTGCCTTC TTGATCAGGA GTGGAGAAGA 

66951 CATGCCGGGT ACGAACGACA TGCCGGGTAC CGAGGACAAG CTCCGCCACT 

67001 ACCTGAAGCG AGTGACCGCG GATCTCGGAC AGACCCGTCA GCGCCTGCGC 

67051 GACGTGGAGG AGCGCCAGCG GGAACCGATC GCCATCGTCG CGATGGCCTG 

67101 CCGCTACCCG GGCGGGGTGG CCTCCCCCGA GCAGCTGTGG GACCTGGTCG 

67151 CCTCACGCGG CGACGCCATC GAGGAGTTCC CCGCCGACCG CGGCTGGGAC 

67201 GTGGCGGGCC TCTACCACCC CGACCC<3GAC CACCCCGGCA GGACCTATGT 

67251 ACGAGAGGCC GGATTCCTGC GGGACGCCGC CCGCTTCGAC GCCGACTTCT 

67301 TCGGCATCAA CCGGCGCGAG GCGCTCGCCG CCGACCCGCA GCAACGGGTG 

67351 CTCCTCGAAG TGTCGTGGGA ACTGTTCGAG CGGGCGGGCA TCGACCCCGC 

67401 CACGCTCAAG GACACCCTCA CCGGCGTGTA CGCGGGGGTG TCCAGCCAGG 

67451 ACCACATGTC CGGGAGCCGG GTCCCGGCGG AGGTCGAGGG CTACGCCACC 

67S01 ACGGGAACCC TCTCCAGCGT CATCTCCGGC CGCATCGCCT ACACCTTCGG 
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67551 CCTGGAGGGC CCGGCGGTGA CGCTCGACAC GGCGTGCTCG GCATCGCTGG 

67601 TCGCGATCCA CCTCGCCTGC CAGGCCCTGC GCCAGGGCGA CTGCGGCCTG 

67651 GCGGTGGCGG GAGGCGTGAC CGTACTGTCC ACGCCGACGG CGTTCGTGGA 

67701 GTTCTCACGC CAGCGCGGAC TCGCACCGGA CGGCCGCTGC AAGCCGTTCG • 

67751 CCGAGGCCGC CGACGGCACC GGATTCTCCG AGGGCGTCGG CCTGATCCTC 

67801 CTGGAACGCC TCTCCQACGC CCGCCGCAAC GGACATCAAG TACTCGGC&T 

67851 CGTACGCGGA TCGGCCGTCA ACCAGGACGG CGCGAGCAAC GGCCTGACCG 

67901 CCCCGAACGA CGTCGCCCAG GAACGCGTGA TCCGCCAGGC CCTGACCAAC 

67951 GCCCGCGTCA CCCCGGACGC CGTCGACGCC GTGGAGGCAC ACGGCACCGG 

68001 CACCACGCTC GGCGACCCGA TCGAGGGGAA CGGACTCCTC GCGACGTACG 

•68051 GAAAGGACCG CCCCGCCGAC CGGCGGCTGT GGCTCGGCTC TGTGAAGTCG ■ 

68101 AACATCGGCC ACACGCAGGC GGCTGCGGGC GTCGCAGGCG TCATCAAGAT 

68151 GGTGATGGCG ATGCGCCACG GCGAGCTGCC CGCCTCCCTG CACATCGACC 

68201 GGCCCACGCC CCACGTGGAC TGGGAGGGCG GGGGAGTGCG GTTGCTCACC 

68251 GATCCCGTGC CGTGGCCACG GGCCGACCGC CCCCGCCGCG CGGGGGTCTC 

68301 CTCCTTCGGC ATCAGCGGCA- CCAACGCCCA CCTGATCGTG GAACAGGCCC 

68351 CCGCCCCGCC CGACACGGCC GACGACGCCC CGGAAGGCGC CGCAACeCCC 

68401 GGCGCTTCCG ACGGCCTCGT GGTGCCGTGG GTGGTGTCGG CCCGTAGTCC : 

68451 GCAGGCCCTG CGTGATCAGG CCCTGCGTCT GCGCGACTTT GCCGGTGACG 

68501 GGTCCCGAGC GCCGCTCACC <3ACGTGGGCT GGTGTTTGCT GCGGTCGCGT 

68551 GCGCTGTTCG AGCAGCGGGC GGTGGTGGCG GGGCGTGAGA GGGCTGAACT 

'68601 GCTGGCGGGG CTGGCTGCGT TGGCCGCTGG TGAGGAGCAC CCGGCTGTGA 

68651 CGCGGTCCCG TGAGGAAGCG GCGGTTGCTG CGAGCGGTGA TGTGGTGTGG 

68701 CTGTTCAGTG GTCAGGGCAG TCAGTTGGTC GGTATGGGTG CTGGTTTGTA 

66751 TGAGCGGTTC CCQGTGTTTG GGGCTGCGTT TGATGAGGTG TGCGGCTTGC 

68801 TGGAGGGGGA <5CTGGGGGTT GGTTJIXSGGTG GGTTGCGGGA GGTGGTGTTC 
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^8851 TGGGGCCCGC GGGAGCGGTT GGATCACACG GTGTGGGCGC AGGCGGGGTT 

68901 GTTTGCGTTG CAGGTGGGGT TGGCCCGGTT GTGGGAGTCG GTCGGGGTGC 

68951 GGCCGGATGT GGTGCTCGGG CATTCGATCG GTGAGATCGC GGCCGCGCAT 

69001 GTGGCGGGGG TCTTTGATCT GGCGGATGCG TGTCGGGTGG TGGGGGCGCG 

69051 GGCGCGTTTG ATGGGTGGGT TGCCTGAGGG TGGGGCGATG TGTGCGGTGC 

69101 AGGCCACGCC CGCCGAGCTG GCCGCGGATG TGGATGGCTC GTCCGTGAGT 

69151 GTGGCGGGGG TCAACACACC TGACTCGACG GTGATTTCAG GTCCGTCGGG 

69201' TGAGGTGGAT CGGATTGCTG GGGTGTGGCG GGAGCGTGGG CGTAAGACGA 

69251 AGGCGCTGAG GGTGAGTCAT GCTTTCGATT CGGCGTTGAT GGAGCCGATG 

69301 CTCGGGGAGT TCACGGAAGC GATACGAGGG GTCAAGTTCA GGCAGCCGTC 

69351 GATCCCGCTC ATGAGCAATG TCTCCGGAGA GCGGGCCGGC GAGGAGATCA 

69401 CATCCCCGGA GTACTGGGCG AGGCATGTAC GCCAGACAGT GCTCTTCCAG 

69451 CCCGGCGTCG CCCAAGTGGC CGCTGAGGCA CGCGCGTTCG TCGAACTCGG 

69501 CCCCGGCCCC GTACTGACCG CCGCCGCCCA GCACACCCTC GACCACATCA 

69551 CCGAGCCGGA AGGCCCCGAG CCGGTCGTCA CCGCGTCCCT CCACCCCGAC 

69601 CGGCCGGACG ACGTGGCCTT CGCGCAeGCC ATGGCCGACC TCCACGTCGC 

69651 CGGTATCAGC GTGGACTGGT CGGCGTACTT CCCTGACGAC CCCGCCCCCC 

69701 GCACCGTCGA CCTGCCCACC TACGCCTTCC AGGGGCGGCG CTTCTGGCTG * 

69751 GCGGAGATCG CGGCGCCCGA GGCCGTGTCC TCGACGGACG GTGAGGAGGC 

69801 CGGGTTCTGG GCCGCCGTCG AAGGTGCGGA CTTCCAGGCG CTCTGCGACA 

69851 CCCTGCACCT CAAGGACGAC GAGCACCGCG CGGCTCTGGA GACGGTGTTC 

69901 CCCGCGGTGT CCGCGTGGCG GCGCGAACGA .CGTQAGCGGT CGATCGTCGA 

69951 TGCCTGGCGG TACCGGGTCG ACTGGCGGCG CGTCGAGCTG CCGACACCCG 

70001 TTCGGGGCGC CGGTACGGGT CCCGACGCCG ACACGGGCCT CGGGGCGTGG 

70051 CTGATGGTGG. CTCCCACGCA CGGGTOGGGT ACTTGGCCX3C AAGCCTGTGC 

70101 CCGGGCGTTG <5AGGAGGCGG GGGCGCCGGT ACGTATCGTC 6AGGCCGGCC 
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70151 . CGCACGCCGA GCGGGCGGAC ATGGCGGACC TGGTCCAGGC ATGGCGGGCA 

70201 AGCTGTGCGG ACGACACCAC CCAGCTCGGA GGAGTGCTCT CCCTGCTGGC 

70251 TCTCGCCGAG GCACCGGCCA CCAGTTCCGA CACCACTTCC CACACCAGTA 

70301 CCAGTTGCGG TACCGGCTCT CTCGCGTCCC ACGGCCTCAC CGGCACCTTG 

70351 ACGCTGCTGC ACGGTCTGCT GGATGCGGGC GTCGAAGCGC CTCTCTGGTG 

70401 TGCCACGCGC GGCGCCGTGT CGTGCGGCGA CGCCGATCCG CTCGTCTCtC 

70451 CGTCGCAGGC CCCGGTCTGG GGACTCGGAC GCGTGGCCGC CCTGGAGCAT 

70501 CCGGAGTTGT GGGGCGGCCT ^TCGACCTG CCCGCCGACC CGGAGTCGCT 

70551 CGACGCGAGC GCGTTGTATG CGGTTCT6CG CGGAGACGGC GGCGAGGATC 

70601 AGGTCGCGCT GCGCCGGGGC GCGGTCCTCG GCCGTCGCCT GGTGCCCGAG 

70651 GCAACCCCGG ACGTGGCCCC CGGCTCGTCC CCGGACGTGT CCGGAGGCGC 

70701 AGCCCATGCC GACGCGACCT CCGGGGAGTG GCAGCCGCAT GGTGCCGTCC 

70751 TCGTCACCGG AGGCGTCGGC CACCTGGCCG ATCAGGTCGT ACGGTGGCTC 

70801 GCCGCGTCCG GCGCCGAACA CGTCGTACTC CTGGACACGG GCCCCGCCAA 

70851 CAGCCGTGGT CCCGGCCGGA ACGACGACCT CGCCGCGGAA GCCGCCGAAC 

70901 ACGGCACCGA GCTGACGGTC CTGCGGTCCC TGAGCGAGCT GACAGACGTA 

70951 TCCGTACGTC CCATACGGAC CGTCATCCAC ACATCGCTGC CCGGCGAGCT • 

71001 CGCGCCGCTG GCCGAGGTCA CCCCCGACGC GCTCGGCGCG GCCGTGTCCG 

71051 CCGCCGCGCG GCTGAGCGAA CTCCCCGGCA TCGGGTCAGT GGAGACCGTG 

71101 CTGTTCTTCT CCTCCGTGAC GGCTTCGCTC GGCAGTAGGG AGCACGGCGC 

71151 . GTACGCCGCC GCCAACGCCT ACCTCGACGC CCTGGCGCAA CGGGCCGGTG 

71201 CCGATGCTGC GAGCCCCCQG ACGGTCTCGG TCGGGTGGGG CATCTGGGAT 

71251 CTGCCGGACG ACGGTGACGT GGCACGCGGC GCCGCCGGGC TGTCCCGGAG 

71301 GCAGGGACTC CCGCCGCTGG AACCGCAGTT . GGCGCTGGGC GCCCTGCGCG 

71351 CGGCGCTCGA CGGGGGCAAG ■ GGGCACACGC TGGTCGCCGA CATCGAGT6G 

71401 GAGCGGTTCG CGCCGCTGTT CACGCTGGCC AGGCCCACCC GGCTGCTCGA 
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71451 CGGGATCCCC GCGGCCCAGC GGGTCCTCGA CGCCTCCTCG GAGAGCGCCG 

71501 AGGCCTCGGA GAACGCCTCG GCCCTCCGTC GCGAACTGAC GGCCCTGCCC 

71551 GTGCGGGAGC GGACCGGGGC ACTTCTCGAC CTGGTCCGCA AACAGGTGGC 

71601 CGCCGTCCTG CGCTACGAGC CGGGCCAAGA' CGTGGCGCCC GAGAAGGCCT 

7 16 SI TCAAGGACCT GGGCTTCGAC TCGCTCGTGG TCGTGGAGCT GCGCAACCGG 

71701 CTGCGCGCCG CCACCGGGCT CCGGCTGCCC GCCACCCTGG TCTACGACTA 

71751 CCCCACACCC CGCACCCTCG CCGCACACCT GCTGGACAGG GTGCTGCCCG 

71801 ACGGCGGCGC GGCAGAGCTC CCCGTGGCCG CCCACCTGGA CGACCTGGAG 

71851 GCGGCCCTCA CCGACCTGCC • GGCCGACGAC CCCCGGCGCA AGGGCCTGGT 

71901 CCGGCGTCTA CAGACGCTGC TGTGGAAGCA GCCCQACGCC ATGGGGGCGG 

71951 CGGGCCCCGC CGACGAGGAG GAGCAAGCCG CGCCCGAGGA CCTGTCGACC 

72001 GCGAGCGCCG ACGACATGTT CGCCCTGATC GACCGGGAGT GGGGCACGCG 

72051 GTGAGCGGGG TGGAGCGGGG TGTGGGGTCG GCGGGCCCTG TGGAACAGGG 

72101 TGACGGACTC GCGGGCCTGG TCGAGCGGGC CGAGGCGCTG GCCGCTCTGC 

72151 GGGGCGCCTT CGACG6CTCC CCGGGCACCG GCGGCAGCCT CGTCGTGCTC 

72201 AGCGGCGCGG TGGGCACCGG CAAGACCGCG CTGCTACGGG CGTGGGCCGA 

72251 CCGCATCGGC GCCGATGCCG ACGCCCTGGT CCTGACCGCC ACCGCCTGCC 

72301 GCGCCGAGCG CGACCTGCCG CTTGGCGTCC TGGAACAGCT GGTACGCAGC 

72351 CCCGGCCTGC CCCCGGCCAG CGCCGAGCGC GCGCTGGCGT GGTGGGACGA 

72401 GGAGGCCTCG GCCACCCCCG GAAAGACGGA CGCGAACGGG ACGAGTGCCA 

72451 ACGGGACGGA CGCCAACGGG ACGGGCGCGG GACAGACGGG CGCGGGGCAG 

72501 GCGGGCGTGG GACAGACGGG CGTGGGCGGA GAGCCCGTCC TGGCCGCCTC 

72551 CGCCCTGCGA GGCCTGTGCG AGGTGCTGCG GGACCTGCTC GCCGAGCGGC 

72601 CCGTCGTGGT CGCCGTCGAC GAOGCGCACC ATGCCGACGC GGCGTCGCTC 

72651 CAGTGCCTGC TCTCCGTGGT GCGCCGGCTG GGGTCGGCAC GACTCCATGT 

72701 GCTGTTCACC GAGTACGOCC ATCAGAAGGC GCAGAACGCC CTGCTGAGCA 
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72751 GCGAGTTCCT GCACGAGCCC GCCCTGCGGC GGATCCGCCT GGAACCGCTG 

72 801 TCGAAGGCGG GCGTGGAGGC CTTGCTCGCC CGGCACCTCG ACGAGCGGAC 

72851 GGCACAAGAC CTCACCCCCG TCGTCCACGG CATGAGCGCG GGCCACCCGC ' 

72901 TCCTCGTACG GGCGCTGGCC GAGGACCACC GTGCGGCGGG CGGCGCCGGG 

72951 GAGGCGTACG GTCGTGCCGT CCTCAGCTTT CTGTACCGGC ACGAGACTCC 

73001 GGTCACCCAA GTCGCCCGCG CCATCGCTGC GTTGGGCGCG CACGCCGGAC 

73051 CCGGTCAGGT CGGGCGGCTG CTCGATGTCG ACGCGGCGTC CGTCGAGCGG 

73101 GCCGTGCGGC AGCTGACCGT CGCGGAGGTG CTGCACGAGG GCCGCCTGTG 

73151 CCACCCGGCG TTCGCGGCGG CGGTCCTGGA CGGCATGCCG CCCGAGGAAC 

73201 GCCGCGCCCT GCACGGACGG GTCGCCGACC TCCTGCACGA GGAGGGGGCG 

73251 CCGGCCACCG AAGTGGCCGC CCACCTCGTC GCCGCCGACC GGTCCGACGC 

73301 CCCGTGGGCG GTACCCGTCT TCCAGGAAGC GGCCCAACTC GCCCTGGACG 

73351 AGGACCAGGT GGAGACCGGC GTCGACTATC TGCGCGCGGC CCACCAGCGG 

73401 TGCCGGGGCG CCGCGCAGCG TGCCGCGGTG GTCGGTGCGC TCGCCGACGC 

73451 CGAGTGGCGG CTCGACCCAG CAAAGGTCCT GCGCCACCTG CCCGACCCTG 

73501 CAGCCATGGC CCCACAAACG GACCCTGCCG CCCTGGCCCC ACACACGGAC 

73551 CCCGCACCCA CAGCCGCACC CACAGCCGCC CCCACCCCCA CCCCCATCCC 

73601 GACCACCCCA CCCCTCCCCA CCCACCTGCT CTGGCACGGG CGGGTCGAGG 

73651 AAGGCCTGGA CGCCATCGGC ACGCTCACCG GGCCCGGACC CAACCCGGCG 

73701 GGTGCGCCGC CGATGAACCC CGCGGACCTG GACACCCGAT GGCTGTGGGG 

73751 CGCCTACCTC TATCCCGGGC ACGTCAAGGA GCGCCTGGGA TCCGGCGCCC 

73801 TGTCCCCGCA GCGCTCGACC CC6CCGGCGG TCACGCCGGA GCTCCAAGGC 

73851 GCGGGCACGC TGATGAACGA CCTGCTGCAC GGCGGCGAAC GCGACGCCAC 

73901 CGAGGCCGCC GAGCGCGCCC TCAACCGCTA CCGGCTCGGC CCCGGCACCA 

73951 TCGCGGTCCA GAGGGCCGCG CTGGCGGCCC TCACCTACCG CGACCGGCCG 

74001 CACCGCGCGG CCGCCTGGTG CGACGGCCTC GTCGCCCAGG CCGACGAGCG 
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74051 CAACAGCCCC ACCTGGCGGG CCCTGTTCAC CGCGTGGCGT GCCCTGCTCC 

74101 ACCTGCGGCA GGGCGACCCG GCCGCAGCGG AACAGCGCGC CGAAACCGCC 

74151 CTCGCCCTGC TCGGATCGAA GGGCTGGGGC GCCGCGATCG GCCTGCCGCT • 

74201 GGCAGCCGCC GTACAGGCCA AGGCGGCCCT CGGCGATGTC GACGGGGCGG 

74251 CGGCCCTCCT GGAACGGCCC GTGCCCCAGG CGGTCTTCCA GACCCGCACC 

74301 GGACTGCACT ACCTGGCGGG CCGGGGCCGC TATCACCTCG CCACCGGCTG 

74351 CCACTACGCC GCACTGTGCG ACTTCTACGC CTGCGGGACC CGCATGAGCA 

744 01 GCTGGGGAGT GGACCTGCCC GCGCTGGAGC CGTGGCGCCT CGGCGCGGCG 

74451 GAAGCGTACC TGGCCCTCGG CGAAGGACTC CTGGCACGCC AACTCGTCGA * 

74501 CGGCCAGCTG CCGTTGCCCA CGCCTGACGA CGGCCGCACC TGGGGCATGA 

74551 CGTTGCGCCT GCGGGCGGCC ACGTCCCCCG CGCCGGCCCG GGCCGAACTC 

74601 CTCGACGAGG CCGTGGCGGT GCTCCGGGAG AGCGGCGACA CCTTCGAGCT 

74 651 GGCGCGGGCC GTCGCCGACC AGGCTGTTGC CGTACGCGAA GGGGGCGAGG 

74701 CGGAACGCGC CCGGCTGCTG GCCCGCAAGG CGGAGCTGCT GGCCCGGCGC 

74751 TGGGGCAGCG CCCCCGCGCC CGCCACCGTC CCCGAACCGC CGGAGCGGCC 

74801 AGGACCGGCC ACTCCGGACG CCGAACTGAC CAGTGCGGAG CGGAGGGTGG 

74851 CCGAGCTGGC CGCCGAAGGG TTCACCAACC GGGAGATCTC CCGGAAGCTG. 

74901 TGCGTCACGG TCAGCACCGT GGAACAGCAC CTGACCCGGA TCTACCGGAA 

74951 GCTCGACGTC AGGCGACTGG ACCTCCAGGC AGCCCTCGGC TGACCTTCAG 

75001 GCGGCCCTCG GCTGACCGCA GGCCACGCGC CTACGGTCAG CCTTCCTGAG 

75051 TCAGGACCGT ACAGCCGCCG TAGGTGTAGG TGTAGGCGTG GGCGAGATCG 

75101 TCGCCGCGTC CAGACCCACC ACGGCCAGCT CCTCCGGAAG GAACGGGGGA 

75151 GCGGTCAGCT CCGGGAGGCG TTCGTCGGCG CGCATCGCCA TCAGGAAACG 

75201 GTTGGAGCCC AGTTCGGCCT GGGGCGCGTT GAGGCTCATC ACGTCCGTGA 

75251 CGATCTCGGA CGCCTTCGGG GAACGGATCG ACGCaSCGGT GATGGCCTCG ' 

7S301 <3CGAACCGCA GACGCTGCTC GGTGTCCACA CCGATGAGCC GCGGATCCGT 
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75351 CGCCGAGACA CGGCAGTTGA CGTAGTCGAT GTCCTTGGTC GCGGCGAGGA 

75401 TCCACGGGTC GTCCACGGCC GCGCCGATCG CCTTCTGCAG GGCGCGGGTG 

75451 CCGGCGCGGG CGGACCCCGT ACCCTCCTGC ACGCTCCGCT CGAACTCGCG 

75501 GTCGATCGTG GTGGCGCAGC GCGCGGCCGA GCTCATGCCG TGGCCGTAGA " 

75551 TCGGGTTGAA AGCGGTCAGC GAGTCGCCGA TGACGAGCAG ACCGTCGGGC 

75501 CACTGTTCGA GGCGCTCCGG ATAGAGGCGG CGGTTGGCGC CGGAGCG&GA 

75651 ACCGAAGACG GGGGTGAGTG GTTCGGCGTC CCGGAGCAGG TCGGCGAGGA 

75701 TCGGGTGGTT CAGGTTCTCG GCGAAGGGGA TGAACTCGTC CTCGTGTGTG 

75751 GGCAGTTGCG CGCCCCGCGT GCAGGAGAGC GTCGCGAGCC AGCGGCCGCC 

75801 CTCGATGGGG TAGACCAGGC CGAAGCGGCC GGGTTCGCGC ACCCGGTCGT 

75851 CGGCGGCGAT GTTCACGGCG GGGAAGTGCG TCGTAGCGCC CGGCGGGGCC 

75901 TTGAAGAGCC GGGTGGCGTA GGCGACGCCC GCGTCCACGA CGTCTTCCTC 

75951 CAGTGCCGGC ACGCCGAGGG CGGCGAGCCA CTGCTTGAGG CGGGAGCCGC 

76001 GCCCGGTGGC 6TCGATCACC AGGTCGGCCT CCAGCTGCTC CTGGCGACCG 

76051 CTGTCGAGGT CGCGGACGAC GACACCGGTG ACCCGGCCGC CACTGCCACC . 

76101 ACCACTTCCC CTCAGCTCGA CGGCCTCGGT GCGCTGCCGG ACGGTGATGT 

76151 TGTCGGCTCC CAAGGCCTGC TGACGTACCG TCAAGTCCAG CAGCGGGCGG 

76201 CTGGCGACCA GCGCGAACTG GGTGGCGGGG AAGCGGTGCT GCCACCCCTG 

76251 AGCGGTCAGC GTCACCAGGT CCTCGGGGAA GCCGAGGCGG CGGGCGCCGG 

76301 . CCGCGAGGAG GCGGTCGGTG GTGCCGGGCA GCATCTCCTC GATGAGGCGG 

76351 GCGCCGTTGG ACCACAGGAG GTGCGCGTGG CGGGCCTGCG GGACCCCCTT 

76401 GCGGTGCTGG GGCTCCTCGG GCAGCGCGTC ACGTTCCACG ACGGTGACGG 

76451 CGTCGACGTG CCGGGCCAGG ACGTGGGCCG <:CAGGGTGCC TGCCATGCTG 

7^501 GCACCCAGGA CGACGGCATG TGCGGGTCGG GTGGTGGTCA CGCGCGTATC 

7-6551 CCTTCGGGGT -GGGTGGTGTC GGCGGGCCCG OCCGGATCGT CCATGGTCAC 

76601 GTCCGTGACG CCCCAGAACG -CCTGGACCCG GC<3GCC<3AGC CC<3TGCTCGT 
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76651 CGAGTTCGAC GATGCCGACG ATGCGGAAGG TCATCGGCCG CGGCCGCTGC 

76701 ACGGTGACCG TGGTCGGCGT CACCACGAAA CGGTCGTCCA TCGACGTCAT 

76751 . CGGCGGGTCC GGCACCTCGT GCGTACCGCA GGAGACGGCC AGTTGGAGAT 

76801 GGCGGCGGAG ATCGTCCTTG' CCCACCATCG GGGGCCGCCC CACGGGGTCC * 

76851 TCGAAGACGA TGTCGTCCGT GAACAGGTCG . AGGACGCCTT C6ATGTCACC 

76901 GGCGTTGATG CGCTCGGCGT AGTCGACGGC CATCTGCTTG CGCGCGGCCT 

76951 CGTCGGGCAT GGCACCTCCA GGAAGGGTGG GCAGACCTTG TGAAAGTCAT 

77001 CGAGGGCCGT TCGGTTCAGC CGAGGACCGT GAGATCGGAT GTGCCCCAGT 

77051 ACGACTTCAG ATGCCGGATG AGGCCGGACG CGTCGATGCG GATCACGAGC 

77101 ATCGCCGTGC GGTGTATGCG GGCCGTCCCC GGGGCGTCGG GGGCCTTGAG 

77151 CCAGCCCCGC TCCGCGTAGA GCGGGCCCAC GGGCAGGTAG' TCCATGACGG 

77201 AGGAAATCTG GATCAGCGCG TGCGTGGCGT CCTGCCCGGC GACGGGCTCG 

77251 GCCGCCTCCT CGCGCAGGTG CGCGGCGAGC AGCGGTTCGT AGTGGGCGCG 

77301 CAGCGCGTCG TGCCCGGTGA CGGGCGGGAG GCCGACCGGG TCCTCGAGGA 

77351 CCGCGTCGGG CGCGTACAGA TCGATGATCG CGTCCAGGTC CCCGGCGTTG 

77401 ATCCGCCGGC TGTGCTCCAG GGCCCGCTTC TTGCGGGCGA ACTCGTTCAT 

77451 CGCTGCCCCT CCACTGCCTG ACCGTGTCCG TTGCCGTTGC CGTTGCCGTT 

77501 GCCGTTGCCG TGTCCGTTGC CCTGCCCGGT GGGCTGTCCG TTGCCCTGTC 

77551 CGCTCGCGCC GTCCCTGCCG AGGTCCCGGT CGATGAACGC GAAGATCTCG 

77601 TCCGCCGACG CGTCCTGGAT ACGTGTACGA GTGGCCAGCG GGACCTCGCC 

77651 GGCOGTGTCC TGCGGCGCGT CGAGCCTGGC CAGCGTCGCG CGCAGCCGCC 

77701 CCGCCAGTTC GGCCCGCGCC GAGCCGTCCT TCGAGGAGAC CGAGAGCAGC 

77751 GAGTCCTCGA TGCGCTCGAA CTCCGCCAGG ACGTCGGCGA GCGGATCCGC 

77801 CGCGCGCGGG GCCAGCTCCT GCCGCAGCTG CGCGGCGAGC TCCGCCGGGT 

77851 "TGGGATGGTC GAAGACGAAC -GTCGCGGGCA. GCTTCAGCCC CGTCGC<3GCC 

77901 GAGAGCC<3GT TGCGCAGCTC CACCGCGGTC AGGGAGTCGA AGCCGAGTTG 

-60- 



SUBSTITUTE SHEET (RULE 26) 



wo 01/68867 PCT/GBOO/02072 

77951 CCGCAGCCCC TGCGTCGCGT TGACGGGCGT GGCCGCGTCG TAGCCGAGGA 

78001 CGGCCGCGAT ATGGGTGCAC ACCAGGTCGA GCAGCGCCTC CTCCCGCTCG 

78051 GGGTCGGACA TCGCGCCGAG CGACTTGAGC AGCGCGGCCG CCCCCGCCGA 

78101 CACGGCACCG CCGCCGCTCT TGCTCCCCCC GCGCACCAGG - TCGCGCAGCA • 

78151 GCGCCGGTGC GGGGTGGCTC TGGGCCTGCC GGCGCATCCG GGCCAGGTCC 

78201 AGACGGACCX3 GCGCGTACAG GGGCAGTCCG CCGGCCCACG CCGCGTCdAG 

78251 GAGGGCGAGT CCCTCGTCGG CGCCGAGCCC GACCACGCCG GCGCGGGCAT 

78301 GGCGCGCCCG GTCGGCGTCG GTGAGCCGTC CCGACATGCC GCTCGCCAGC 

78351 TCCCAGTAGC CCCACGCCAG GGAGGTCGCC GCCGCACCGC CGTCGTGCCG 

78401 GTGCCGGGCC AGCGCGTCCA AGAAGGCGTT GGCGGCCGTG TAGCTGCCCT 

78451 GGCCGGGGCC GCCGAGCAGC CCGGCGACCG AGGAGTACAG GACGAACGCG 

78501 GACAGGTCCG CGTCCCGCGT CAGCTCGTGC AGGTGCCACG CGGCGTCCGC 

78551 CTTCACGCGC ATCACCTCCT CGACCTGCTC GGCCGTGAGG TTCTGCACCA 

78601 CGGCGTCGTT CACGGTGCCC GCGCAGTGGA AGACGGCGGT CAGCGGGTGG 

78651 TCCGAGGGCA CCGCCGCGAG GAGGGCGGCG GCTTCGTCCC GGTCGCCCGG 

78701 GTCGCACGCG GCGAAGGTGA CTCGCGCGCC GAGCGCGGAG AGGTCGGCGG 

78751 CCAGTTCGAG TGCGCCCGGC GCGTCGGCTC CCCGCCTGCT GGACAGCAAC 

78801 AGGTGCCTGG CTCCGTACCG TTCCACCAGG TGACGGGCCG TCAGCGAGCC 

78851 GAGTGCTCCG GTGCCGCCGG TGACCAGCAC GGTGCCCTCG GGGTCGAAGG 

78901 ^CGGGAGGCAG CGAGAACACG gtcgtgcccg CCGAGGGCGG GGCCGCCATC 

7 8951 GCGGCGGGCG CCTGCCGGAT GTCCCACACG GTGATGTCGA GCGGCGTCAG 

79001 AGCACGGCTG tcaccccgtt ccgcggg,cag cccgggctcc gccgactccg 

79051 tgatctcggc aagctcggtc agctcx3gtca gctccgcgag gatttcccgt 

79101 AGGCGCCCGG <3CTCGGGCGG CACGACAGCC ,TGTCCCTCGT CCGGACGACC 

79151 GCCQGCACQG TGGACCACCA GGGCCCCCTC GTGGCGGAGG GTGACGTCGG 

79201 • CCGGGCGCTC GGCGCCCGAA TCATCCGCCG TCGACGCACC GTCCACGGCC . 
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79251 TCGACCCGCC ACCGCCCGGC AAGAGCAAGA CGCAGCACGG CACGGCCGAC 

79301 GGAACCGGTT TCCTCCCCGA CGAGCAGAGT CTCGCCGCCG CGCGGCGCCA 

79351 CGACATCCGC CAGCACGTGA TACGCGGACA CATAGGCCCC CAAGGACCCG 

79401 GCCGCCTGCG CCCAACTCCA GCCCGCCGGA ACCGGCATAA GCAGCGCGGC 

79451 ATC6GTGACG GCCACCGGGC CCACCGCGTC GAACAACCCC ATCACCCGGT 

79501 CGCCCACGGC CACCGAACCG ACCTCGCCGC CGACTTCCGT CACCACACCG 

79551 GCACCCTCGA CCTGGCCCGC CGTGAGCGGC CCCGGCGCCG CGGCCCGCAC 

79601 CGCCACCCGC ACCTCGTGCG GCTCCAGCGC CCGTCCGGCC TCGGGAGCGT 

79651 CGACAAGGGA CAACTGCTGT CCGCCGCCCG CCTCTTGGCA CCGAGCCAGC 

79701 CGCCACGTGA GCGATCCGAC CGGCGGCACC AGCCGCACCG ACGCGTCGTC 

79751 GCGCACGAGC CGTGGCACGT AGGCGCGCCC GTCACGCAGC GCCAATTCCG 

79801 GTTCGCCGGA GGCCAGTACG CCGGTCAGCG TGGCCGGAGA AGACTCCAGT 

79851 CCGTCCACGT CGAGCAGCGT GAGGCGACCG GGATTCTCGG CCTGCGCGCT 

79901 GCGCACCAGA CCCCACAGCG ACGCGCCCGC CAGATCACCG GCGGTCTCAC - 

79951 CCGGCCGCGC GGCGACCGCG CCTCGGGTGA CGACGACGAG ACGGGTCGCC 

80001 GCGAACGCCG GGTCGTCCAC CCACTCCTTG AGCAGCGACA GAAGGGACAC 

80051 GGTGGCCAGC CGCGCGTACC CGGCCGGGTC GCCGCCCCTG CCATCGGCAT 

80101 CCGCAACGGC CCCGGCACCT GCGCCGGGCG' CGGCGCACAC GGGGAGCACG 

80151 ACATCGGGCG CTTCGCCCCC AGCCGCCACT CGGTCCCGGA GCGCACCGAA 

-80201 CGTGTCCCAC ACGGGGCCGG CGGCCAGCGC ATCGGACAAG GCGTCGGCCA 

80251 GCGCACCGGC CGACGTACCG CCCATCGGGC CACTCTCGAC CGGCGCGAGG 

80301 ACCGCGGCAC GCGGGGCGCC GCCGCCCGTC TCCTCGGCCC GCGCGGCGAC 

80351 CTCCATCCAC ACGAGCCGGA ACAGCGCGTC ACGGTCCGCC GCACGGGCGC 

80401 'CCGCGATCTG GTGGGCGGCC . ACCGGCCGTA CCGTGAGCGA CTCCAGCGTG 

80451 AGAACCGGCT CCCCGCCTCC GCCCCGGTCC ACGGCCGTGA. GGGCCAGCTG 

80501 GTCGGGCGCG GTGCGTGCGA TACGTACC-CG CAACTTCTCA GCGCCGGGCG 
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80551 CGTGCACCCG CAACCCGCTC CAGGAGAACG GCAGCAGCAC TTGGTCGGTG 

80601 TCGGCGGACG ACGTGACCGC GTCCAGGATC AGCGCGTGCA GCGTGGCGTC 

80651 GAGCAACACC GGGTGCACCT GGTAGCGGTC GGCCCTGCCG CTCTCCGCCT 

80701 CGGGCAGCGC CACCTCGGCG AAAAGGTCGT CCCCGAGCCG CCACGCGCTC ' * 

80751 ACCAGTCCCT GTGAGCCGGG CCCGAAGTCA TAGCCGTACG AAGCGAGTTC 

80801 CCCGTACGGA TCCTGCTCGC CGACCGGTGT GGCGCCCGGG GGCGGCCACG 

8Q851 TCCCGCCGAA CGAGGCGTCC CCGGCGTCGG GCCCCGGGGG AGCGACCACG 

80901 CCCGCGGCAT GCCGGGTCCA CACGGCCTCC TCGCCCTCAC CCGTGGGCCG 

80951 CGAATGGACG GTCACGGGAC GCCGCCCGTC CTCGGCCACG GAACCGACCA 

81001 CCACCTGCAC GTCGACCGCG CCCGCACCCT CGTCCCCGAA GGCGAGCGGA 
81051 ■ GTGTGCAGCG TCAGCTCCGC CAACTCCGCG CAGCCGGCCC GCACCGCGGC 

81101 CTGCAGCGCG AGCTCCACGA ACGCCGAACC GGGCAGCAGC ACCGTGTCCA 

81151 TGACCCGGTG CTCGGCCAGC CACGCCTGGT CCCGCGGAGA GATCCGGCCG 

81201 GTCAGCAGGT GACTGCCGCC -GTCCGCGAGT TCCACGGCGG CTCCGAGCAG 

81251 CGGATGCCCC GCGGACGCGA GCCCGAGCCC CGCCGGGTCC CCGGCGAGCC 

813 01 CCCTGCGCCC CTCCAGCCAG AACCGCTCCC GCTGGAAGGC GTACGTCGGC 

81351 AGATCCACCA CCCQAGGCAG CGGCACGGCC GGGAACCAGC CCGTCCAGTC 

81401 GACCTCCGCC CCCGCGCCGA AGGCCTGGGC GGCCGCGCGG : GTGAGCTGCG 

81451 CGGCGTCGCC GTGGTCGCGG CGCAGGGTGG GCACGACGGT GGCGGGCATG 

81501 TCGGCCCGCT CGATGGTCTC CTCCATGCCG AGGTTGAGGA CGGGGTGGGG 

81551 GCTGGCCTCG ATGAACAGGC GGTAGCCGTC GGCCAGCAGC GCTTCGATGG 

81601 TGTCGGCGAA GCGGACGGGC TGGCGGAGGT TGGTGACCCA GTAATCCGTG 

81651 TCGAGGGTGG TGGTGTCGTC GAGGCGTTCG GCGGTGACCG TGGAGTAGAA 

.81701 GGCGACGTCC GTGGTCGTGG -GCOGGATGTC GGCCAGGCGC TCGGTGAGGA 

81751 <3GTCGTGGAG CTGGTCOATC "PGGGGGCCGT GGGAGGGGTA TCCGACGTCG 

8ia01 ATGACGCGGG CGGGCAGGCC TCGCGCCTCC GCATCCGCGA <:CACGGCTGC 
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81851 CACATGCTCC GGCGGCCCTG AAATGACCGT AGAGGAGGGC CCGTTGACGG 
81901 CAGCGACACA CACGCCGGGC CGGTCGCCGA TGAGCTCAGC AACCTGCTCC 
81951 GAGCCGGCCC CCAACGACGC CATGTCGCCC TGCCCCATGA GCTGACGGAG 
820 01 CGCGTCACTG CGTACGGCCA CGATCCGCGC CGCATCCTCC AGTGACAGTG 
82051 CCCCCGCCAC ACACGCGGCG GCCATCTCGC CCTGCGAGTG CCCGATGACG - • 
82101 GCAGCCGGGG TGATGCCGTA ATCGGCCCAC ACCGAAGCCA GCGAGACC^T 
82151 CACCGCCCAC AACACGGGCT GCACGACCTC GACCCGGGAC AGCTCACTCC 
82201 CGTCCCCGCG CAACACCGCA CTCAGCGACC AGTCCACATG CGCCGACAGG 
82251 GCCCGCTCAC ACTCCGCGAT CCGCGCCGCG AAGACGGGGG ACTCGTCAAG 
823 01 GAGCTGGGCA CCCATGCCCA CCCACTGCGA CCCCTGCCCC GGAAACACCA 
82351 ACACCGGACC CGCGCCGGAG GCGCCCTGTA CGGCGCCCTC GACGACGTCC 
82401 GGTGACGGCT CGCCCGCCGC CAGGGACCGT AGCCCGGCGA GGAGAGTCTG 
82451 GCGGTCCTTG CCCACGACGA CGGCTCGGTT CTCGAACACC GACCGGGTCT 
82501 TGACCAGGGA CCAGCCCACG TCCAGCGGCG ACGCGAGCCG CGGGTCGGCG 
82551 GTGGCGCGGT CGGCCAGCAG GCGGGCCTGG GCCCGCAGCG CCTCCTCGCC 
82601 GCGCGCCGAC ACCACCCAGG OCACCACTCC GGCCGGCGCC GCGGCGTCCT 
82651 CCGCCGGAGC GGTCACGGGC TCCGGCGCGT CCGGGGCCTG TTCCAGGATG 
82701 AGGTGCGCGT TGGTGCCGGA GATGCCGAAG GCGGACACCC CGGCGCGGCG 
82751 CGGGCGTTCG CCGCGCGGCC AGGAGACCGG TTCGGACAGC AGGCGGACGC 
82801 CACTGCCGTC CCAGTCCACG TGCGGCGTGG GCGCGTCGAT GTGCAGGGAG 
82851 GCGGGCAGCT GTTCGTTGCG CAGCGCCATG ACCATCTTGA TCACACCGGC ■ 
82901 <3ACACCGGCC GACGCCTGCG CGTGCCCGAT GTTCGACTTG ATCGAGCCGA 
82951 GCCACAGCGG CCGGTCCGCG <3GCCGCTCCT TGCCGTAGGT GGCGACGAGC 
83 001 GCGCTGGCTT CGATGGGGTC GCCCAGCATG GTGCCGGTGC CGTGCGCCTC 
83 051 CACCGCGTOG ACGTCCTCGG CGGAGAGCCG CGCGTTGGCG AGTGCCTGCC 
83101 GGATCACCGG CTGCTGCGCC TGCCCX3TTGG GTGCCGTGAG CCCGTTGCT.C .* . 
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83151 GTGCCGTCCT GGTTGATGGC CGAACCCCGG ATCACCGCCA GGACGTTGTG 

83201 GCCGTTGCGC CGGGCCTCCG AGAGCCGTTC GAGTACGACC AGGCCGACTC 

83251 CCTCGGCCCA GCCGGTGCCG TCGGCGGCGG CCGCGAACGG CTTGCACCGG 

83301 CCGTCCTTGG CGAGCCCGCG CTGCAGCGAG AACTCGACGA ACGAGCCCGG 

83351 CGTGGCCATC ACCGTCGCGC CGCCCGCGAG AGCGAGCGAG CACTCGCCCT* 

83401 GGCGCAGCGC GTGCGCCGCC TGGTGGATCG CCACCAGGGA CGAAGAGcSvG 

83451 CCGGTGTCGA TCGTCATGGC GGGGCCTTCT AGGCCGAGTA CGTACGACAC • 

83 501 CCTGCCGGAG GCGACACAGC CGAGGTTGCC GGTGCCGATG TAGCCCTCGA 
83551 CCTCGGTGGG CTGTTCACCG ACGAGCGGGA GGTAGTCGAA GATGGTCAGG 
83601 CCCGTGAACA CCCCGGCGTC GCTGCCCTTG AGGGTCTCCC GGTCGAGGCC ' 
83651 CGCGCGTTCO ATCGCCTCCC ACGCGGTCTC CAGGAGCAGC CGCTGCTGCG 
83701 GGTCCATCGC GACGGCCTCG CGGGGGCTGA TGCCGAAGAA TCCGGCGTCG 
83751 AAGTCGCCCG CGTCGTAGAG GAACCCGCCT TCGCGCACAT AGCTGGTGCC 
83801 GCGGCTCTCC GGGTCCGGGT CGTACAGCGT . CTCCAGGTCC CAGCCCCGGT * 
83851 CGTCGGGGAA GGCCCCCATG GCGTCCTTGC CGGCCGCGAC CAGATCCCAC 
83901 AGCTCCTCGG CGGAGCGGAC GTCGCCCGGA TAGCGGCAGG CCATGCCGAC 
83951 GATCGCGATC GGCTCGTCGT CGGCGGCGCC CCTGGAGGCC CCGGCCGCCC 

84 001 GCACCGGGTC GGCGGAGGCC GCCGCGTCAC CGGACAGCTC GGCCCGCAGG 
84051 ACGTCGGTGA GCGCGTCGGG -GGTGGGGTGG TCGAAGACGA CCGTGGTCGG 
84101 CAGTGTCAGG CCGGTGCTCT TGTTCAGCCT GTTGCGCAGC TCCACCGCGG 
84151 TCAGCGAGTC GAAGCCCAGC TCCTGGAACG GCTTGGTGGC GGGCACCGCG 
84201 TCGACGTCCG AGTGCCCCAG CGTGGCCGCC GCCTGGGAGC GCACGTGCTG 
84251 CAGCAGCAAC TGCCGCTGCT GCGCCGGCTT CGCCTCCGTC AGCTCCTGCT 
84301 <3GAGCGACGA TGCCTCCGTG GCGTCTTCCT GCTGTGCCGC GGGTGCGCTG 
84351 GCCGGCGGGT TCTCGGGCAG ATCGGCGAGG AGGGGGCTGG GCCGCTGCGC * ■ 
84401 GGTGAACOTC GACGTGAACT GCGCCCAGTC GAAGTTCGCC ACGGTCAGCG 
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84451 TCGTCTCACC CGCGTCCAGG GCCTGCTGCA GCGCCTTGAC GCACAGCTCC 

84501 GGGCTGAGCG GGTGCAGGCC GAAGCGGCTG AAGAACGTCA ACGCGGCCTG 

84551 GTCCGCCGCC ATGCCCGCCT CGGCCCAGGG CCCCCAGGCG ATGGAGGTGG 

84601 CGGGCAGGCC CTCGGCGCGG CGGTGCTCGG CGAGGGCGTC GAGGAAGTGG 

84651 TTGGCCGCAC CATAGGCGCC CTGCTGGCCA CTGCCCCACA ' CGCCTGCGCC 

84701 CGACGAGAAC ATCACGIU^CG CCGAGAGCGG CAACTCCCGG GTCAGTT^AT 

84751 GCAGATGGTG AGCGGCGAGC GCCTTCGGAC GCAGCACCTC GTCCAGCTCG 

84801 GCACCCGACA CGTCGCCGAG ACCGATGTAG TTCGGCACGC CGGCCGCGTG 

84851 GATGACGGCG GTCAGCGGGT. GCTCGGCGGG GACATCGTCG ATGAGGCGTC 

84901 GCACCTGCTC GCGGTCGCCG ACGTCGCAGG CGGTGACGGT GACGGCGGCC 

84951 CCCAACTCCG TCAGTTCCGC GGCGAGTTCC TGTGCTCCCG' GGGCGTCGGG 

85001 GCCGCGGCGG CTGGTCAGGA GGAGGTGCGG GGCGCCCGCA CGGGCGAGCC 

85051 ACCGCGCGAG GACGGCGCCG ATGCCGCCGG TCCCGCCGGT GATGAGAGTG 

85101 GTGCCGTCGG GCCGCCAACC AAGCCCGCTG GCGACCGTGT TGGCGGGCGC 

85151 GTGTGCAAGG CGACGGGCAT GGACGCCGGA CGGCCGGATG GAGATCTGGT • 

85201 CCTCGTCCTG CGGAACCAGC GCGGCGGCCA GCCGGGCCAG CGTCTGATGG 

85251 TCGATACGAG CGGGCAGATC GACCA<3CCCG CCCCACAGCC GCGGATACT.C 

853 01 CAGCGCAGCG ACGCGCCCCA GCCCCCACAC CTGAGCCTGC ACCGGGTGGG 

85351 TGAGGGCGTC GCCGGCGCTC GTGGAAACAG CCCCCTGCGT GAGAGTGCGT 

85401 ACGGCGATGT CGGCGCCGTT GTCCGCGAGG GCCTGGACGA GAGCGGTCG.T 

85451 CQCGGCGAGT CCGGCGGGCA CGGCCGAGTG CTCGGGATGC GGCTCCTCGT 

85501 CCAGGGCCAG CAGATTGACG ACTCCGGCAA ACGCGGCCCC GTCCATCAGG 

85551 ACACGCAGCT CCTGCGCCAA CTCCGTACGC TCCATGGCAC GTGCGTCGAC 

85601 CACGTGGCGT CGCACCTCGQ CACCATGGGC GGTCAGCGTC TGCGCGGTCG 

85 651- CQAGGACGGC C<3GGTOGTCG GCGTGCGCGG CGGGCACGAG CAGCAGCCAG 

85701 GCCCCGCTGA GCTCCX3GCOC CGGCAGGTCG GGCAGATGCT TCGAAGTGAC 
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85751 CTGATAACGC CAGGAGTCGA CGGTGGACTG CTCGCGGTGC CGACGCCGCC 

85 801 AGGGCGAGAG GACGGGCAGC GCGGACTCCA GCGCTCCGAC GCTCTCCGCC 

85851 TGCCCCTCGA TCTCCAGACT GCCGGCGAGG GCGTCGATGT CCAGGTCCTC 

85901 GATCGCCTGC CACACCCGGG CCTCGACCGG .ATCGTGCCCA CCACCCACGG 

85951 CTGCGACCGC CGCGGGCGGC TCCACCCAGT AGTGCTTGTG CTGGAAGGCG 

86001 TAGGTGGGGA GGTCGACGGT ACGGGGGGTG GGGTCGGCCG GGAACCAG*CG 

86051 CCGCCAGTCG ACGGGGGCGC CGGCGGTGAA. GGCGTGGGCG GCCGCGCGGG 

86101 TGAGCTGGGT GGTGTCACCG TGGTCGCGAC GCAGGGTGGG GATGGTGACG 

86151 GCCGTCCCCG CAGCACCGGC CTGCTGCTCG ATGGTCTCCT GGATGCCGAG 

86201 GTTGAGGACG GGGTGGGGGC TGGCCTCGAT GAACAGGCGG TAGCCGTCGG 

86251 CCAGCAGCGC TTCGATG<3TG TCGGCGAAGC GGACGGGCTG GCGGAGGTTG 

86301 GTGACCCAGT AGGCGGTGTC TAGGGCGGTG GTGTCGTCGA GGCGCTCTGC 

86351 GGTGACCGTC GAGTAGAACG CCACGTCGGT GGTGGTCGGC TGGATGTCGG 

86401 CGAGCCGGTG GGTGAGGAGG TCGTGGAGCT GGTCGATCTG GGGACCGTGG 

86451 GAGGCGTACC TGACGTCGAT GACGCGGGCC CTGAGTCCCT GCGCCTCCGC 

86501 ATCGGCGACG ACGGCTGCCA CATGCTCCGG CGGGCCCGAA ATCACGGTCG 

86551 ACGACGGTCC GTTGACGGCC GCGACGACTA CGCCCGGCCG GTCGCCQATC 

86601 AGCTCTGCGG CCTGCTCGGC ACCGGTGCTG AGCGAGGCCA TGTCGCCGTG 

86651 CCCTTGCAGC TGACGAAGCG CGTCGCTGCG TACGGCTACG ATCCGTGCCG 

86701 -CATCCTCCAG TGACAGTGCC CCCGCCACAC ACGCGGCAGC CATCTCGCCC 

86751 TGCGAGTGCC CGATGACGGC AGCCGGGGTG ATGCCGTAAT CGGCCCACAC 

86801 CGCAGCCAGC GAGACCATCA CCGCCCACAG CACGGGCTGC ACGACCTCGA 

86851 CCCGGGACAG CTCGCTCCCG TCCCCGCGCA AGACATCACT CAGCGACCAG 

86901 TCCACATGCG CCGACAGCGC CTGCTCACAC TCCGCGATCC GCGCCGCGAA 

86951 GACGGGCGAC TCGTCAAGGA GCTGGGCGCC CATGCCCACC CACTGCGACC . . 

87001 CCTGCCCCGG AAACACCAAC ACCGGCCCAG GACCGACATC ACCGGCCACC 
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87051 CCGGCCACCA CATCCGCCGA CGCCTCACCC GCGGCCAATG CCTCGAGGCT 

87101 GGCACCAGCC TGAGCCAAGT CCCGCCCCAC GACCACGGCC CGCTGATCGA 

87151 ACAACGCGCG TGTCGTGGCC AGCGACCAGC CCACCTCGGA GACCGACGCA 
87201 TCCGCCAGCC CGGCCGCGAA CTCGCGCAGC CGCCGCGCCT GTTCACGCAA 

87251 CGCGTCCGGC GTCCGCCCGG ACACCACCCA CGGCACGACC CCACCCGGCT 

8 7301 CAGCCGCCAC GGGGCCCGGC GCGTCCTCTT CCGGCGGCGC CTCCTCCATsA 

87351 ATCAGGTGCG CGTTCGTCCC G6AGATCCCG AACGCCGAGA TGCCTGCCCG 

87401 CCGCGTGCGC TCCGCCGGCC AGTCCACGGG CTCGGAGAGC AGTCGTACGC 

87451 TGCCCTGTTC CCACTGGACG TGCX3GTGACG GGGCGTCGAT GTGCAGGGAG 

87501 GTCGGGAGGA GACCGTTGCG CATCGCCATG ACCATCTTGA TGACGCCCGC 

87551 GACACCGGCG GCGGCCTGCG TGTGGCCGAT GTTGGATTTC ACCGAGCCGA 

87601 GCCAGAGCGG ACGGTCCTCC GGGCGCCCCT GGCCGTACGT GGCGATCAGG 

87651 GCCTGCGCCT CGATGGGGTC GCCGAGCGTG GTGCCGGTGC CGTGCGCCTG 

87701 TACGGCGTCO ATGTCCTCGG CGGAGAGGCG GGCGTTGGCG AGGGCGGCGC 

87751 GGATGACGCG TTCCTGGGAG GGGCCGTTGG GGGCGGCGAG CCCGTTGCTC 

87801 GTACCGTCCT GGTTGGTGGC CGAACCCCGT ATCACCGCAA GGACCTTGTG 

87851 GCCGCGGCGC CGCGCTTCGG . AGAGCAGCTC CAGCGCCACC ACCCCGGCGC 

87901 CCTCGCCCCA GCCGGTGCCG TCGGCGGCGG CCGCGAACGG CTTGCACCGC 

87951 CCGTCGGGCG CGAGCCCCCG CTGCCGGGAG AACTCGGTGA ACGAACCCGG 

88001 CX3TCGCCATC ACCGTCGAAC CGCCCGCCAG CGCGAGCGAG CACTCGCCCT 

88051 GCCGCAGCGC CTGACTTGCC AGATGGATCG CCACCAGGGA CGACQAGCAC 

88101 GCCGTGTCGA CGGTGACCGC GGGACCTTCG AGCCCCACCG TGTAGGAGAT 

88151 CCGGCCCGAC ACCACACTGC <:GAGGTTGCC GGTGCCGATG TACCCCTCGA • 

88201 CGTCGCTGGC CGTCTGGCTG atcagcgtca ggtagtcgtg ggcgctcact 

88251- CCGGTGAAGA CGCCGGTGTC GCTGCCCTTC AGCGCGTGCG GGTTCATGCC 

88301 CGCGTGCTCG ATCGCCTCCC ACCGGGTCTC <:AGGAGCAGC CGCTGCTGCG 
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8 8351 GATCCATCGC CGTGGCCTCG -CGCGGGCTGA TGCCGAAGAA CTCGGCGTCG 

88401 AAATGGCCGG CGTCGTACAG GAAGGCGCCG TCCCGCACAT AGCTGGTGGC 

88451 CGGATGCTCC GGATCCGGGT GATACAGCGA - CTCCAGGTCC CAGCCCCGGT " 

88501 CGTCGGGGAA CCCCGCGACC GCGTCACCCC CGTCGCGCAC GAGTTCCCAG 

88551 AGGTCCTCCG CCGACCGGGC. GCCGCCCGGG TAGCGGCAGG CCATGCCGAC 

88601 GATGGCGACC GGCTCGGTOG ATTCCTTGTC GTGGAGCCGT TGCCGGGCfCT 

8 8651 GGCGCAGCTC CGCGGTGACC CACTTGAGGT GATCGAGAAG CTTCTCCTCG 

88701 TTCGACATCT GACCCAGGCT CCTTGGCGCT ACGTGGTGAT CGGGGCGTAT 

88751 GAGGTTGGGG GAGGGCAAGG GGGCCGGTGT GGCCGGGGCT CATCGCGCTC 

88801 AGGACTGATC GCTGCTCAGG ACTTCCCGAA CTCACTGGAG ATGAGGTCGA 

88851 AGATGTCGTC CGCGCTCGCC GCCTCCAGAT CGGCATGGGC CGAATCAGTG 

88901 CCTTCCGGCC CGTCCTGCGC CGGACTCCAC TTCGACACAA GGACCTGCAG 

88951 CCGGCCCACG ATGCGGCGCC GGGCCGCCTC GTCCACCTCG GCCGCTCCGA 

89001 ACGCCGTGTC CCACTTGTCG AGCGCCGCGA GCACGTCGCC CTCACCTGCG 

■89051 ACCTCGGCGC CGTCGCCGAG CTGTCCGCGC AAGTGCGTGG CGAGGGCCTC 

89101 GGGCGTGGGA TGGTCGAAGA TCACGGTGGC GGGQAGCGAG AGTCCGGTCG , . 

89151 TGGTGTTGAG CTGGTTGCGC AGCTGGACCG CGGTGAGCGA GTCGAAGCCC 

89201 AGCTCCTGGA ACGGCTTCGC GGCJGGGAATG TCCTCCACCG TGCGGCCGAG 

89251 CGTCGCGGCC GCGTATGTCC GGACCTGCTG GACCAGGAAG CCGAGCCGCT 

89301 GTGATGCGGG CGTCTTCGCC AGCTCCTCGC GGAAGGCGCT CGTCTCGGGG 

89351 GCGGTCCCCG TCTGCTCGGC CTCCCGCTGG TTCTCGGGAA GGTCGTCGAG 

89401 GAACGGACTG GGCCGCTGCG CGGTGAACGT CGGCGTGAAC TTCGCCCAGT 

89451 CGAAGTTCGC CACGGTCAGC GTGGCGTCGC CCGCGTCGAC CGCCTGGTGC 

89501 AGCGCCTTGA CGCACAGATC CGGAGCGATC GGGAGCAGAC CGAAGCGCTT 

89551 GAAGTACGTC AGTGACTCCG GGTCGGCGGA CATGCCfCGCC TCGGCCCAGG 

89601 GCCCCCAGGC GATGGAGGTG GGGGGCAGGC CCTGGGCGCG GCGGTGCTCG 
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89651- GCGAGGGCGT CGAGGAAGTG GTTGGCCGCA CCATAGGCGC CCTGCTGGCC 

89701 ACTGCCCCAC ACGCCGGCGC CCGACGAGAA CATCACGAAC GCCGAGAGGT 

89751 CCAGGTCGCG CGTCAACTCG TGCAGGTTCC* AGGC.CGCGTC GGACTTCGAC 

89801 CCCAGCACCT CGCCGAGGCG CGCGGTCGTC AGATCACCGA TCGCGGTCAG 

89851 ATCGGTCATG CCCGCCGCGT GGATGACGGC TQTGAGGGGA TQCTCGGCCG • 

89901 GCATGTCGTC GATGAGGCCG CTCAGTTGGC GGGGATCGCT GACGTCGdAG 

89951 GCGGTGATGG TGACGGCGGT GCCGAGCCCG TCGAGCTCGG CGGCGAGTTC 

90001 CCGGGCGCCG GGCGCGTCGG GCCCGCGACG GCTGGTGAGG TGAAGACGGG 

90051 GGGCGCCCTG CCGGGCCAGC CAACGGGCGA GGACGGCACC GATGCCGCCG . 

90101 GTCCCGCCGG TGATCAGGGT GGTGCCCCGA GGCCGCCAGG tggcctcgct 

90151 gtgcacggga ttctgaatgc ttccgacggc gtgcgtgagg cgccggtggt 

90201 ggattccggt ggggcggacg gcggtctggt cctcgtcgtc ctgggggagg 

90251 agagcggcgg cgaggcgggg gagggtgtgg cggtcgatac gagcggggag 

90301 gtcgacgagt ccggcccaga ggcgcgggtg ttcgagggct gcgacgcggc 

90351 cgagccccca gaggtgagcc tggagggggt gggtgagtgg gtcggtggcg 

90401 gccgtggaca cggcaccctg cgtgacggtg tgcaggggtg cggtcgtgcc 

90451 gttgtcgccg agggcctgga ggagagcggt cgtcgcggcg agccgggcgg . 

90501 gcacggcggg gtgctcgggg tgcggctcct cgtccagcgc cagcagattg 

90551 acgattccgg caagaccggc cgtgtccacc gcggccagct cctgacgtcc 

90601 cgcccogccg gtctcgaccg gatgcagccg gacggcggcc gccccgtgct 

90651 cgctcaaggc ctcggcggtg gctcgtacgg cggggtgctc cgccttgtcg 

90701 gcagggacga acagcagcca gtcgccgccg agttccggtg cgggcccgtc 

90751 ggaccgctgt ttccacgtga cgcggtaccg ccaggagtcg atggtcgcct 

90801 ggtcctggtg ccgacgccgc cagcccttga gcaccggcaa cgcgggctcc . 

90851 agcgcccgga ccgcctcctc gctgccctcc tccgacccca gcgtctcggc 

90901 caggaoaccg agatcgagct cctcgagggc gtgc-cacagc tgggcctcgg 
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90951 CCGCACTCTG CTCACCGCTG ACGGCGCCCG AGGCGGACGC GGAACGTTCG 
91001 AGCCAGTAGT GCTGGTGTTG GAAGGCGTAG GTGGGGAGGT CGACGGTGCG 
91051 GGGGGTGGGG TCGGCCGGGA ACCAGCGCCG CCAGTCGACG GGGGCGCCGG 
91101 CGGTGAAGGC GTGGGCGGCG GCACGGGTGA GCTGGGTGGT GTCGCCGTGG ' 
91151 TCGCGGCGGA GGGTGGGGAC GACGGTGGCG GGGATGTCCG CCTGCTCGAT 
91201 GGTCTCCTCC ATGCCCAGGC CCAGCACGGG GTGGGCGCTG GCCTCGA'f&A 

91251 ACAGGCGGTA GCCGTCCGC6 - AGAAGGGCTT CGATGGTGTC GGCGAACCGG 

913 01 ACCGGCTGGC GGAGGTTGGT CACCCAGTAA TCCGTATCCA GGGCTGTGGT 

91351 . GTCCGTCAGA CGCTCGGCGG TGACGGTCGA ATAGAAGGCC ACGTCCGTGT - 

91401 TCGCGGGCCG GATGTCAGCC AGGCGTTCGG TCAGCAGATC GTGGAGCTGG 

91451 TCGATCTGGG GGCCATGCGA GGCGTACCCG ACGTCGATGA CACGGGCGCG 

91501 CAGACCACGT GCCTCCGCAT CGGCGACCAC GGCAGCCACA TGCTCCGGCG 

91551 GCCCTGAAAT CACCGTAGAC GACGGCCCAT TGACCGCCGC GACGACCACG 

91601 CCCGGCCGGT CACCGATCAG CTCAGCGGCC TGCTCGGCAC CGGTGCTCAG 

■91651 CGAGGCCATG TCACCGTGCC CTTGCAGCCG ACGAAGCGCG TCACTGCGTA 

91701 CGGCTACGAT GCGCGCCGCA TCCTCCAGCG ACAGCGCCCC CGCGACGCAC. 

91751 GCGGCAGCCA TCTCACCCTG CGAGTGCCCG ATCACAGCAG CCGGAGTGAC 

91801 CCCGTAATCA GCCCACACCG CAGCCAGCGA GACCATCACC GCCCACAACA 

91851 CCGGCTGCAC GACCTCGACC CGGGACAGCT CACTCCCATC CCCGCGCAAC 

91901 ACCGCACTCA GCGACCAGTC CACATACGCC GACAGC<3CCC GCTCACACTC 

91951 CGCAATCCGC GCCGCGAAGA CGGGGGACTC GTCCAGCAGC TGGGCACCCA 

92001 TGCCCACCCA CTGCGACCCC TGGCCCGGAA ACACCAACAC CGGCCCAGGA 

92051 CCCACATCAC CAGCAACCCC GGCCACCACA CCCGCCGAAG CCTCACCCGC . 

92101 AGCCAACGCC CGCAGGCCAG CCGTCAACGC ATCOCGGTCA CGCCCCACCA 

92151 CGACAGCCCG GTGCTCGAAC AC<:GACCGGG TCGTGGTCAA CGACCAGC-CC 

92201 ACATCAGCCG CCGACGCATC GGGC<3GCCGG GCCGCGAACT CGCCCAGCGG 
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92251 CCGCGCCTGT GCACGCAGCG CGTCCGGCGT CCGCCCGGAC ACCACCCACG 

92301 GAACGACCCC ACCCGGCTCC TCGGCCACGG AGCCCGGCAC GTCCTCCTCC • 

92351 TCCGGTGGTG CCTCCTCCAG GATCAGATGC GCGTTCGTCC CCGAGAAGCC 

92401 GAACGAGGAC ACCCCCGCCC . GGCGCGGGCG CTCGCCCCGG GGCCACTTCA 

92451 CCGGGTCGGT GAGCAGGCGC AGCCCGCTGC CGTCCCACTC CACGTGGGGC 

92501 GAGGGGGCGT CGACGTGCAG GATGGCGGGC AGCAGGTCGT GCCGCAgSgG 

92551 CAGGACCATC TTGATGACAC CGGCCACACC GGCGGCGATC TGCGTGTGGC 

92601 CGATGTTGGA CTTCACCGCT cScACCCACA * GCGGCCGGTC CTCCGGCCGT 

92651 TCCCGGCCGT AGGCGGAGAT GAGAGCCCCG GCCTCGATGG GGTCGCCGAG 

92701 CGTGGTGCCG GTGCCGTGCG CCTCCACGGC GTCGATGTCC TC6GGGGCGA 

92751 GGCGGGCGTT GGCGAGGGCG GCGCGGATGA CGCGTTCCTG GGCGGGGCCG 

92801 TTGGGGGCGG TCA6GCCATT GCTCGCGCCG TCCTGGTTGA TCGCCGAACC 

92851 CCGGATCACC GCGAGGACCT TGTGGCCCTT CCTGCGGGCG TCGGAGAGAC 

92901 GCTCAAGGAG AACCACCCCC GTACCCTCCG CCATGCCCAT GCCGTCGCTG 

92951 CTCGCCGAGA ACGGCTTGCA CCGTCCGTCG GGGGCCAGGC CGCGCAGTTC 

93001 GCTGAAGCCG ATCAGCGGGG CGGGCGACGA CATCACGTAC GTGCCGCCCG 

93051 CCAGCGCCAG CGAGCACTCC TGTGTGCGCA GGGCCTGGGT GGCGAGGTGA 

931*01 AGGGAGACCA GCGACGAGGA GCACGCCGTG TCGACCGTCA CCGCGGGGCC- . 

93151 TTCGAGGCCC AGGGTGTAGG CGACGCGGCC GGAGGTGACG CTGCCGGAGT 

93201 TGCCGATGGT GAAGTATCCG • GCGGTGCCCT CGGGGACCTC GGACGCGCCG' 

93251 AGGGCGTAGT CGAGTCCGTC ACAGCCGATG AAGGTGCTGG TGTCGCTGGA 

93301 GCGGAGGCTG AGGGGGTCGA TGCCGGCCCG TTCGATCGCC TCCCACGCCG 

93351 TCTCCAGGGC GAGCCGCTGC TGCGGCGCCA TGGCCGCGGC CTCGGTGGGT 

93401 CCGATGCCGA AGAAtSGTGGG GTCGAAGTCA .CCGGGGTCGT AGACGAAGCC 

93451 GCCTTCCCGG ACGTAACTGG TGCCGGTGCT CTCGGGGTCC GGGTCGTAGA 

93501 GGGAATCGAG GTCCCAGTTG CGGTTGCCGG GCAGGGGCGC GACGGCGTCG 
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93551 CCGCCGGTGG AGACCAGCTC CCAGAACTCT TCGGGAGACC GGACTCCGCC 

93601 GGGCAGCCGG CAGGCCATGC CGATGACCGC GACCGGTTCG TGGCCCGCCG 

93651 ACTCGACGTC CTGCAGCCGG CGTTCCGTCT GACGCAGGTC CGCGGTGACA 

93701- CGCTTGAGGT ATTCCAGAAG TTTCTCTTCG GTGTGCGCCA TCCCGGTGAC 

93751 AACCGCCCCT CTCCGCGAGA ACAGACCGCA GACTCGTCGA CGGCGCTAAA 

93801 GCCCTCCTAA TACTCGGCTG TGTACCGCTC GCTGCCACGG GTGTCCGCft.C 

93 851 TGGTCGGAGG CTCCGGCCCA GGGAACAGGG GCTTTCTTAG GGGCGCTTAA 

93901 GCGGTGCCTG CCAGGGTGTG CCGGTGTCAG GCCGTCACGC CCTGATCAGC 

93951 GGCGTCGCCC GTGCCGTGCC CGTGCGGTCG GTGGGCCTGA CCGTCGGTCC 

94001 GGACAACGCG AAGCGAGGCA TCGTGCCCAT CACGGATAGC AAGCCGGCCG 

94051 CCACATTCCC CGACCTGGTC GACCCGTCGT TCTGGGCGCG GCCGCACGCG 

94101 GAACGCGTGG CGCTGTTCGA GGAGATGCGC GGGCTGCCGC GGGCGGCGTT 

94151 CATCCGGCAG AACATGCCCG GCGTGCCCTG GACGTTCGGC TACCACGCGC 

94201 TGGTCAAGTA CGCGGACATC GTGGAGGTGA GCCGCCGCCC GCAGGACTTC 

94251 TCCTCGAACG GCGCGACCAC CATCATCGGT CTGCCGCCCG AGCTGGACGA 

94301 GTACTACGGC TCGATGATCA ACATGGACAA CCCGGAACAC TCGCGGCTGC 

94351 GGCGCATCGT CTCGCGTTCG TTCGGCCGCA ACATGATCCC CGAGTTCGAG 

94401 GCCGTGGCGA CCCGCACCGC CCGCCGCATC ATCGACGAGC TCATCGCGCG 

944S1 ' GGGACCCGGC GACTTCATCA GGCCCGTCGC CGCGGAGATG CCCATCGCCG 

94501 TGCTCAGCGA CATGATGGGC ATCCCGGCGG AGGACCACGA CTTCCTCTTC 

94551 GACCGGTCCA ACACGATCGT CGGCCCCCTC GACCCGGACT ACGTGCCGGA 

94601 CCGGGCGGAC TGCGAACGCG CGGTGATCGA GGCGTCACGC GAAGTCGGCG 

94651 ACTACATCGC TGGCCTTCGT GCGGAACGGC TCGCCGCCCC CGGCAACGAC . 

94701 CTCATCACCA AGCTCGTGCA AGTCCAGGCG -GACGGCGAGC AGTTGACGCG 

94751 GCAGGAACTC <3TCTCCTTCT TCATCCTGCT -CGTCATCGCC -GGGATGGAGA 

94801 GCACCGGCAA CGCCATCTCG CACGC<3CTGG TACTGCTGAC CGAGCATCCC 
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94851 GAGCAGAAGC AGCTGCTGCT CTCGGACTTC GACACGCACG CGCCGAACGC 
94901 GGTCGAGGAG ATCCTCAGGG TCTCCACGCC CATCAACTGG ATGCGGCGCG 
94951 TCGGCACCCG CGACTGCGAC ATGAACGGCC ACAGGTTCCG CAGGGGCGAC 
95001 CGGATCTTGC TGTTCTACTG GTCGGGCAAC CGGGACGAAT CCGTCTTCCC 
'95051 TGACCCGTAC CGGTTCGACA TCACGCGCGG GACGAACGCG CACGTCACGT 

9510.1 TCGGCGCGGT GGGCCCGCAC GTCTGCCTCG GGGCCCACCT CGCCCGTJTtG 

95151 GAGATCACCG TCCTGTACCG GGAGCTGCTC GCGGCGCTGC CCCAGATCCA • 

95201 TGCCGTGGGG eAGCCCCGCA GGCTGGACTC CAGCTTCATC GAAGGGATCA 

95251 -AGCACCTGCA CTGCGCCTTC TGAGCACATA CGCTTCCCTC TGCGCATGTG 

95301 CGCTCACGAC GCTCCGATCA GCGACTGCCA ACGACTGTCA GCGACCGGAC 

953 51 AGGGCCAAGG GCGGTGGGGA CATCAGGTGC ATGTCACCCG CGAGTATGGC 

954 01 CCGCTGCAGC TCCTGGAGCG GGCGCCCGGG TTCGAGCCCC AGCTCGTCGT 
95451 TGAGCGTCTT GCGCACCGAC TGGTACACCT TCAGCGCGTC CGCCTGCCQC 

955 01 TCGGAGCGGT AGAGCGCCAG CATCAGCTGG CGGTAGAACG CCTCGCACAT 
95551 CGGGTTCTCC GCGGTGAGGG CGTACAGCAT GCCCACGGCC TCGCGGTGCC 
95601 GGCCGAGCTG GAGCTGGCAC TCGACGAGCA TCTCCTGACA. CTCCAGGCGG 
95651 ATCTCGGTCA GCCAGGTCGA GAAGCCGTCG ATGATCGGGC CGTTGGTGCC 
95701 GGGACCGTTC CCGCCCTGCC CGAGGATCGG GCCGCGCCAC AGCGCGAGCG 
95751 CCTGCCCGAA ACAGGAGGCC GCCTCGTCGA ACCGCTTCTC CCTGAGCAAC 
95 801 GACC<3CCCCA CGTCCACCAG TTCGGGGAAG ATCTGGGCAT GGATCTGGTC 
95 851 GTCGTCCCGC TTGTGCAGGA CGTACCCCGG CGCACGGGTC TCGACGGGGT 
95901 TGCCCGCCGA ACCGGGCACC TTGAGGAACT TGCGGAGCTG GGAGATGTAC 
95951 ACATGCAGTC CCGCCGTGGC GCOCCGCGGC AGGTCCTCGC CCCAGATCTC 
96001 ccGCATCAGC tgctc<::aggg AGACCACCCG GTCGGCGCGG ATGAGGAGCA . 
96051 CGGTGAGGAC <3ATCTCCACC TTCTGGGCGT TGATGGTGGC GTAGTCGTTT 
96101 CCX5TCCTTGA TGCGGAGC-GG GCCCAGCATT TOGTATCTCA CCGAGCGTTC 
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96151 CCCCTTGCTG TCGCACGCTG CTGCGCACTG TCGGCCAGGG CCTTGGAGAT 

96201 GACTTCCGTG ACGCCCTGCT GGTGCGTGTT CAGATAGAAG TGGGCCCCGG 

96251 CGAAGACCTT GAGGTCGAAG GGGCCTTCCG TGTGCTGCTG CCAGGCCTCG 

96301 ACCTCGTCCA GCGGCGCCTG CGGGTCCCGG TCGCCCACCA GGGCGGTGAT 

96351 GGGGCAGGAC AGCGGCGGCG ACGGGTTCCA CCGGTACAGC TCGACCGCCC 

96401 GGTAGTCGTT GCGGACGACC GGGATGATCT CCGCGAGCAG TTCCTCGT?fcG 

96451 TCCAGGAACC GCGGGTCAGT GGCACCGGCC CGGCGCAGCT CGGCGGCCAA 

96501 CTCGGTGTCG TCGAGGAGGT GTACGGTGCC GCGCCGGAAG CGGGACGGCG 

96551 CGCGGCGTCC CGAGACGAAC AGCCGGCAGG GCTGCTTCCC CGTGCGCTCG 

96601 CGGAGCCGCT GGGCGACTTC GTAGGCGAGG ACGGCGCCCA TGCTGTGGCC 

96651 GAAGAACGCC AACGGGCGGT CGTCGAACGG GCCGAGCGCA TCGGTGATGA 

96701 GGTCGGCGAG TTCCCCGATG TCGTCCAGGA GCCGCTCTCT GCGGCGGTCC 

96751 TGTCGCCCGG GGTACTGCAC CGCGAGGACC TCGCTGTCGG TCGGGAGAGT 

96801 GGGGGATTGC GCAAGGGGGT GGTAGTAGGA GGCCGAGCCG CCCGCGTGGG 

96831 GGAAGCAGAC CAGGCGAACG ACGGCTTCCG GTCGGGGCCG GAAGCGACGT 

96901 ATCCAAGGGT CCGACATATC GGGTGGGGGG AAGGCAGACA AGATCTTTCC 

96951 CTTCGCCAGG AACGGTGACA ACGGTGTGTC GCCACATCAC ATAGCCGCTC 

97001 CTGATCATGC GCAGCTCAAA GTTTAAACGG CAACGTCGCT AACGGGGGAG 

97051 CAGGGCGGAA TCAGACATTC CCCATCCTTT ATTCCGCGAT TCTTACGTGA 

97101 TCGAATCCCG GCGGCCAAGA TGGAGTAAAT TTCAATATGA ATGCTTAACG 

97151 CCGCACAGCT TGTACGGCGG GCCGCCCGGG CGGTGACTGG CGTCCCTGCC 

972 01 AGCCGTGATG GCCTGACGAG GCCTCCGGGA TCCATCCCCC GCCCGCTGTC 

97251 GCCGAGTTCT TTGCGGGATT ATTACGTTGC ATTGGTTTGC TTCGTGGCCC . 

97301 <3GGCC<3TTGG CCTGCGCTAT TTGGCAGCCT TCCGTCATGG GTGGTAAAAG 

973.S1 ATCGCCTTTC CC-CTCTGGGG - TGCCGGTC<3AOCTGGCCTCG ACCGCGATTG 

97401 TGGCTTGTTG TTTTCTTGTG GCGCCGCGTG TGAAACAGCG GCAGTTGGCC 
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97451 ACTCGCTCTG ACAGGCTCCG GGGACGGGGT TGTCACCTTT TGGGGTGACT 

97501 GGCCTCGTTC AAGGCGTCCT GGCCCGTGGT GCATCCGCGA TCGTCGT(3CC . 

97551- ATGGGTGAAG TGGGAAGGAG CACAGAACGA TGAGCGAGAG CATGGCGTGG 

97601 GTGACGCGGG ACGTCCGCAA GGCCCGCAAG GAGGGCAGTG . CGGGGACCGC ' 

97651 GCGGCGCCGA GCCGACCGGC TGGCGGACCT GGTCGCCCAC GCCCGCTCGG 

97701 CGTCGCCGTA CTACCGGGAG CTCTACCACG GCCTGCCCGA GCGGATCCSVG 

97751 GACCCGACGC TGCTGCCGGT GACGGACAAG AAGCAGCTGA TGGACCACTT 

97 801 CGACGACTGG CCGACGGACC GCGACATCAC CTTCGAGAAG GTCCGCGCGT 

97 851 TCACCGACGA CCCCGAGCTG ATCGGGCGGC GCTTCCTCGG CCGCTATCTG 
97901 GTGGCCACCA CGTCGGGCAC CAGCGGCAGG CGCGGCCTGT TCGTGCTCGA 
97951 CGACCGGTAC ATGAACGTGT CCTCCGCCGT CTCCTCCCGG GTGCTCGCCT 
98001 CCTGGCTCGG CCCCCTCGGC ATCGCCCGGG CCGTCGTCCA CGGCGGCCGC 

98 051 TTCGCCCAAC TCGTCGCCAC CGAGGGACAT TACGTCGGCT TCGCCGGATA 
9.8101 CTCCCGCCTG CGCCAGGACG GCGGAGCGCG CAGCAAGCTC GTCCGCGCGT 
98151 TCTCTGTGCA CGAGCCGATG TCACGTCTGG TCGCCGAACT CAACGAGTAC 
98201 CGGCCCGCGT TCGTCATCGG CTACGCCAGT ACGATCATGC TGCTCACCGC 
98251 CGAACAGGAA GCGGGCCGGC TGCACATCGA CCCGGTGCTG GTCGAGCCCG 
98301 CGGGCGAGAC GATGACCGAG AGCGACACCG ACCGCATCGC TGCGGCGTTC 
98351 GGCGCCAAGG TGCGCACGAT GTACAGCGCG ACCGAGTGCA CCTACCTCAG 
98401 CCACGGCTGC OCCGAGGGCT GGTACCACGT CAACGACGAC TGGGCCGTGC 
98451* TCGAACCGGT CGACGCCGAC CACCGGCCCA CCCCGCGGGG GGAGTTCTCG 
98501 CACACCACCC TGATCAGCAA CCTCGCCAAC CGCGTCCAGC CGTTCCTCCG 
98551 CTACGACCTG GGCGACA<KX3 TCATGCTCCG CCCCGACCCC TGCCCCTGCG 
98601 GCACCCCCTC GCCOGCGATC CGGGTCCAGG GCAGGTCGGG CGACATCCTC 
98651 . ACCTTCCCCT CGGGCCGGGG CGACGACGTC AGCCTCGCCC CGCTCGCCTT 
98701 CAGCAGCCTC TTCGACCGCA TGCCGGGAGT C<5AGCTCTTC CAGATCGAGC 
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98751 AGACCGCGCC GTCGACCCTG CGCGTCCGCG TGGTCCAGGC GCCCGGCGCC 

98 801 GACGCGGACC ACGTGTGGCA GCGGGCCCAC GACGGGCTGA CCCACCTCCT 

98851 CGCCGACAAC AAGCTCGACA ACGTAACCGT CGAACGGGGC GAGGAGCCGC 

98901 CGCGGCAGGC ATCCGGCGGC AAGTACCGGA CGATCATCCC GCTCGCCGCC • 

98 951 TGAACGCTCG CCGACTAGCC GCGCGCCGCC TGAGCTGCTC TCACCGCGCG 

99001 TACGGGCGCA GCGGAGGCTC CTCGTCGACC CACGGCTGGC TGTGGATCAG 

99051 CAGCTCGATC GGGAAGTTCA GCAGGCCGGG CAGGGCGTCG ACGGCCTCCT 

99101 GGCTGTTGAG CGGCATGACC GfeCTTGGCGC AGTGCGCGCG GTCGATGCGG 

99151 CTCGTGTCGG CGGGACCGGG GTGCTCGATC GCATCGGCGA CCAGGTCGTA 

99201 GCTGACCTGG TCGACGACCA TGGCGATGTG GGTCGGCCAC GGCCGACCCG 

99251 GACAGATGTC CTGGAGGCCG ATGCGGTGGG CGCCCGGCAG CGACGGCGCC 

99301 TCCCCGTCGG CCACCACGGA CTCGTCGGCG TATGAATAGA TCGTGGTGTA 

99351 CGACGGTCCT GCGGGCATGG GCGTGCCGTC GGCGCCCAGA GCCTTCGACC 

99401 AGTTCGAGTC GCGGGCGAAC TGCAGGACCG ACGCCGGGCA GCCCGCCACC 

99451 TCGGCGATCG GGCGGCAGGG CGAGGCCAGC CGGGTCCCCT GGAACGGGGA 

99501 GCCCAGGGTC ACCATGTCGT CGACCTTCCC CGGCAGGTCC GGCCAGAAGC 

99551 GCAGGGCCCA CGCCGTGAGG AGGCCGCCCT GGCTGTGCCC GACGAGATCG 

99601 ACCTTCCGGC CGGTGGCCTC CTGGATCGCG CGGGTCGCGT ACACCACGTA 

99651 CTCGACGGAC TCCTGCATGT CACGGAGCCC GCGACCGGGA GAATCCACCC 

99701 AACAGGACTG GTAGCCCTTC TTCTTCAACT CGGCCATGTA GTTCCAGGCG 

99751 TAGTTCTCCT CGCCCTTGAG GCCGGTCCCG GGCACGAAGA GGACGGTCGG 

99801 CTTGTCACCG GCGTCACGGA' GGTCCCCCAG CTCCGTCCCG CAGTGCAGCG 

99851 CCTTGGCGAG CTCGGCCGCC GGTATCTCCA ACGGGGGAGA GGAAACATCC 

99901 <3CCGCCGAAG CGGCGGAGGC CGGAAGCACG GTGGCGGCCA GCACGGCCGC 

99951 -CACGAGTCCG CCGAGCCATG AGGACAAGGG CACGGTGACC TCCACAGGAA 

100001 CCTTCACGAG TGAGCGGAAA CTCCCTCCGG AGGGAGCACC TCATCGTGCG 
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10 0051 GCGGCGCCAC AGTAGCCGTC AACTGCCCCA CGGGGCTGAG TAGTTGACAG 

100101 TTGGCCGGGC TCGGCCGGCG AAGCGCCCGG GCCCCGCCGC CCCGCGCCGT 

100151 GCGCGAGGGG TCCGTGACCT GGGTGGACGG TCCGGTTGGA CATCCCGGGG 

100201 GAGCCTCTGG CATGGTCGCC CGTCCGTCCC CCTCAAGAAC CGAAGGGAGC 

100251 GTCACGATCA CGATGATCGA AGTCAGCACG CGCAGCATGA AGGAAGCGGC 

100301 TGCCGCCOAG CAGCTCCGCG CGGAGACCAC GACACTGGAC ATTCCAaSsG 

1003 51 GTTTCGACCT GTGGACGGCC GACGAGATCG CGGACTGGCT CGACGGCGTC 

100401 GAGGACGACC CGGCAGTCTC dGACGCCGAC TTCTACGCGG CCCAGCAGCG 

•100451 GTGCGACGGG TCCTCGGCAC CGAGGGCACC TGACCCGCCG GCGGCCCTGC 

100501 GCGGCCCTAC GTGTGCAGCG CCCCGTCCTC CTCCACATGC CCCTCCGGCT 

100551 CCAGCTGGAT CGTCGAGTGG GCCACGTCGA AGTGGCCCCC GACACACCGC 

100601 TGAAGGCGCC CCAGGAGCTC CGCGTACCCG CTCGCGAGAG CCTCCTCCGT 

100651 GACCACCACG TGCGCGGTGA GCACCGGCAT CCCCGAGGTG ACCGTCCAGC 

100701 CGTGCAGATC GTGCACGGCG ACCACGCCCC GCTCCTCCAG CAGGTGCCGG 

100751 CGCACCTCGC CGAGGTCGAC GTCCTGCGGG GTCGCCTCCA GCAGGACGTG 

100801 CAAGGAGTCC CGCAGCAGGC CGTACGCGCG CGGCACGATC AGCAGGCCGA 

100851 TGACGATCGA CGCGATCGGG TCGGCGGCCT GCCACCCCGT GAGCAGGATG 

100901 ACCAGGCCGC CCACGATCAC CGCGACCGAG CCGAGCGCGT CGCCCAGCAC 

100951 CTCCAGGTAC GCGCCCCGCA GATTGAGGCT CTTCTCCTTG GCGTCCCGCA 

101001 GCAGCCACAG GCCCACCAGG TTGGCGGCGA GCCCGCCCAG CGCGACCACG 

101051 AACATCAGGC CGCCCTTCAC CTCCACCGGC TCGCTGAACC GGCCGATCGC 

101101 CGACCACAGG ACCCAGGCGA AGATGACGAC CAGGAGCAGC GCGTTCAGGA 

101151 CCGCGGAGAA GATCTCCACG CGGTAGAACC CAAAGGTGCG CCGCGGCGTC 

101201 GGCGCCCGCT GGGCGAGGGT GATGGCACCG ACGGCCAGCG AGACGCCGAC 

101251 CGCGTCGGTC AGGCTCTGGG CGGCGTCGGC GAGCAGCGCG AGGCTGCCGG 

101301 ACAGGAGCGC GCCGACCACC TGGATGACGG TGATCGAGCC OCTGATGCCG 
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101351 ATGGTCCACA GCAGGCGCTT GCGGTACGTG CCGCTGAGAG TGCCGCCCGC 

101401 CGCCCCGGCG GACGGACCGT GGTCGTGCCC CATGCCCGCG AGTGGACCAC 

101451 GGCGGCGCGG CACCCGCCAC CGAGCGGCCG CCGGTCGGCT CAGTGCAGCC 

101501 GGGCCTGGGT GGAGGTGTCG CGCTGGTGCG GGATGCCGAG CGGCGGCGGC 

101551 AGCTCGCCCT GCTGCACCCT GACCGTGCGC ACGGGGGCGG GGACCCGGAT • 

101601 GCCCTCGGCG CGGTAACGCT GGTGCAGGCG CTTGATGAAC TCCTGCTTCA 

101651 TGCGGTACTG GTCGCTGAAC TCGCCGACGC CGAGGATCAC CGTGAAGCTG 

101701 ATCCGCGAGT CGCCGAAGGT GTOGAAGCGG ATCGCCGCCT CGTGGTCGGG 

101-751 • GACCGCGCCG GTGATCTCGG CCATCACCTC GTCCACCACC . TCGGTCGTGA . 

101801 CCTTCTCGAC CTGCTCCAGG TCGCTGTCGT AGCTGACCCC GACCTGCACC 

101851 ATGATCGACA GCTCCTGCTC GGGGCGGCTG TAGTTGGTCA TGTTGGTGCC 

101901 GGCGAGCTTC GCGTTGGGGA TGATGACGAG GTTGTTGGAG AGCTGGCGGA 

101951 CCGTGGTGTT GCGCCAGTTG ATGTCGACGA CGTAGCCCTC CTCCCCGCTG 

102001 CTGAGCTGGA TGTAGTCGCC GGGCTGCACG GTCTTCGCGG CGAGGATGTG 

102051 CACGCCCGCG AAGAGATTGG CGAGCGTGTC CTGCAGTGCG AGGGCGACCG 

102101 CGAGAGCTCC CACGCCGAGG GCGGTGAGCA GGGGTGCGAT GGAGATGCCG 

102151 AGGGTCTGAA GGACGATGAG GAAGCCCATC GCGAGCACCA CGACGCGGGT 

102201 GATGTTCACG AAGATGGTGG CCGATCCGGC CACTCCGGAG CGGGACTGTG 

102251 CCACGGCCTT CACCAGGCCG GTGACGATCC GGGCCGCCGT GAGCGTGGCG 

102301 GCCAGGATGA GCAGCGCGGT CAGCGTCATG GTGACGTTGC GTCCGGTGCG 

102351 CGGCGTGAGC GGCAGCGCGC CCGCCGCGGC GGCGAGCCCG GCGGTGATGG 

102401 CCGCGCAGGG CACGAGGGTG CGCAGGGCGT CGACGATGAC GTCGTCACCG 

102451 CTCCACCGGG TTTTGCTCGC CCGTTCGCCG AGCCACCTCA GAAGTGCGCG 

102501 GAGCAGCAGC CCGGCGACGA CGCCGGCGAC GACCGCGATA CCGGCCACGA 

102551 TCCAGTCGTG CAGTGTGAGG GCACGGGTCA TCAGTTCGCT CCCGTCGTAC 

102601 GGGGGGAOTC CGCCTGTCTG GGGGGTATGT GATGTGACGT CACCTTGTGA 
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102651 TACCTGCTCG ATTCCGGGGA GTGCGGTCAC GCCGGGACGA GAGCTCGGTT 

102701 CCGGCGCGGA CGTCATCCTG CCCCATCCGC CCACGGCAGG CGTGCATACC 

102751 CCCACCTGGA TCTTCACAGA CCGGCCACGT CTGTCCATGC GCCGATGAGC 

102801 GCGCTGCCCG TGGTAAAGCA TTGAGTCAGG CGATTTGGCC ACTCGGCACT 

102851 CGGCGGACCG GTCGAGCCGG TCGATCTACG TGAGCGGAGG CGGTTGAGCA 

102 901 TGGCGTCCAT GTGCAGACCC GGAATGTCAC CCGTCAATTC GCACAACCSkG 
102951 TGGGATCCGC TGGAGGAGAT CATCGTCGGG CGGCTGGAGG GCGCGACCAT 

103 001 TCCCTCCAGC CATCCGGTCG TCGCGT6CAA CATCCCGACC TGGGCGGCAC 
103051 GGCTGCAGGG TCTCGCCGCC GGGTTCGAGT ATCCGCAGCG GCTGATCGAG 
103101 CCGGCGCAGC AGGAGCTCGA CCAGTTCATC GCTCTCCTGC AATCCCTCGA 
103151 CGTCACAGTG AGACGGCCGG CGGCCGTCGA CCACAAGCAC CGCTTCGGGA 
103201 CCCCCGACTG GCAGTCGCGC GGCTTCTGCA ATTCCTGTCC GCGGGACAGC 
103251 ATGCTCGTCG TCGGCGACGA GATCATCGAG ACCCCGATGG CGTGGCCGTG 
103301 CCGCTGTTTC GAGACGCACT CGTACCGCGA ACTCCTCAAG GACTACTTCC 
103351 GGCGCGGCGC GCGCTGGACG GCGGCGCCGC GCCCCCAGCT CACCGAGGCC 
103401 CTGTACGAGA AGGACTTCCG CCCTCCCGAG GAGGGCGAAC GATGCGCTAC 
103451 ATCCTCACCG AGTTCGAGCC GGTGTTCGAC GCGGCGGATT TCGTGCGGGC 
103501 GGGCCGCGAC CTGTTCGTGA CGCGGAGCAA CGTCGCCAAC CTGCTGGGCA 
103551 TCGAGTGGCT GGGCCGCGAC CTTCGGGCCG GAGTACCGCG TGCCACGAGA 
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BUDAPEST TREATY ON THE INTERNATIONAl 
xxc^COGNITION OF THE DEPOSIT OF MlCROORGANl^irlS 
FOR THE PURPOSES OF PATENT PROCEDURE 



Professor P.F. LeadJay, 
Department of Biochemistry, 
University of Cambridge, 
80 Tennis Court Road, 
Cambridge. 
CB2 IGA 



INTERNATIONAL FORM 

RECEIPT IN THE CASE OF AN ORIGINAL DEPOSIT 
issued pursuant to Rule 7.1 by the 
INTERNATIONAL DEPOSITARY AUTHORITY 
identified at the bottom of this page 



NAME AND ADDRESS 
OF DEPOSITOR 



IDENTIFICATION OF THE MICROORGANISM 



Identification reference given by the 
DEPOSITOR: 

Escherichia coli 
XLl-Blue MR (MO-CNl 1) 



Accession number given by the 
INTERNATIONAL DEPOSITARY AUTHORITY: 



NCIMB 40956 



IL SCIENTIFIC DESCRIPTION AND/OR PROPOSED TAXONOMIC DESIGNATION 



The microorganism identified under I above was accompanied by: 
CD a scientific description 

@ a proposed taxonomic designation 
(Mark v^ith a cross where applicable) 



III. RECEIPT AND ACCEPTANCE 



This International Depositary Authority accepts the microorganism identified under I above, which was received by it on 
1 July 1998 (date of the original deposit)! 

W; RECEIPT OF REQUEST FOR CONVERSION 



The microorganism identified under I above was received by this International Depositary Authority on 

(date of the original deposit) and a request to convert the original deposit to a deposit under the Budapest Treaty was received by it on 

(date of receipt of request for conversion) 



V. 



INTERNATIONAL DEPOSITARY AUTHORITY 



Name: NCIMB Ltd., 



Address:23 St Machar Drive, 
Aberdeen, 
AB24 3RY, 
Scotland. 



Signature(s) of person(s) having the power to represent the 
International Depositary Authority or of authorised 



ofFicial(s): 




' Where Rule 6/4(d) applies, such date is the date on which the status of International Depositary Authority was acquired. 

Form BP/4 (sole page) 
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BUDAPEST TREATY ON THE INTERNATIONAL 
OGNITION OF THE DEPOSIT OF MlCROORGANIw.JS 
FOR THE PURPOSES OF PATENT PROCEDURE 



Professor P.F. Leadlay, 
Department of Biochemistry, 
University of Cambridge, 
80 Tennis Court Road, 
Cambridge. 
CB2 IGA 



INTERNATIONAL FORM 

RECEIPT IN THE CASE OF AN ORIGINAL DEPOSIT 
issued pursuant to Rule 7.1 by the 
INTERNATIONAL DEPOSITARY AUTHORITY 
identified at the bottom of this page 



NAME AND ADDRESS 
OF DEPOSITOR 



IDENTIFICATION OF THE MICROORGANISM 



Identification reference given by the 
DEPOSITOR: 

Escherichia coli 
XLl-Blue MR (MO-CN33) 



Accession number given by the 
INTERNATIONAL DEPOSITARY AUTHORITY: 



NCIMB 40957 



U. SCIENTIFIC DESCRIPTION AND/OR ?KOVOSED TAXOt^OUlC DESlGfiATlON 



The microorganism identified under I above was accompanied by: 
D a scientific description 

® a proposed taxonomic designation 
(Mark with a cross where applicable) 



m. RECEIPT AND ACCEPTANCE 



This IntemationaJ Depositary Authority accepts the microorganism ideatiticd under I above, which was received by it on 
1 July 1998 (date of tiie original deposit)^ 



IV. RECEIPT OF REQUEST FOR CONVERSION 



The microorganism identified under I above was received by this International Depositary Authority on 

(date of the original deposit) and a request to convert the original deposit to a deposit under the Budapest Treaty was received by it on 

(date of receipt of request for conversion) 



INTERNATIONAL DEPOSITARY AUTHORITY 



Name: NCIMB Ltd., 



Address:23 St Machar Drive, 
Aberdeen, 
AB24 3RY, 
Scotland. 



Signature(s)t>f person(s) having the power to represent the 
International Depositary Authority or of authorised 



official(s): 




Datq$j9July 1998 



1 Where Rule 6/4(d) applies, such date is the date on which the status of International Depositary Authority was acquired. 

Form BP/4 (sole page) 
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BUDAPEST TREATY ON THE I^^rERNATIONAL 
Ut:,COGNITION OF THE DEPOSIT OF MICROORGANlor.lS 
FOR THE PURPOSES OF PATENT PROCEDURE 



Professor P.F. Leadlay, 
Department of Biochemistry, 
University of Cambridge, 
SO Tennis Court Road, 
Cambridge. 
CB2 1GA 



INTERNATIONAL FORM 

RECEIPT IN THE CASE OF AN ORIGINAL DEPOSIT 
issued pursuant to Rule 7.1 by the 
INTERNATIONAL DEPOSITARY AUTHORITY 
identified at the bottom of this page 



NAME AND ADDRESS 
OF DEPOSITOR 



IDENTIFICATION OF THE MICROORGANISM 



Identification reference given by the 
DEPOSITOR; 

Escherichia colt 

XL I -Blue MR (MO-CN02) 



Accession number given by the 
INTERNATIONAL DEPOSITARY AUTHORITY: 



NCIMB 40958 



n. SCIENTIFIC DESCRIPTION AND/OR PROPOSED TAXONOMIC DESIGNATION 



The microorganism identified under I above was accompanied by: 
O a scientific description 

a proposed taxonomic designation 
(Mark with a cross where apph'cable) 



III. RECEIPT AND ACCEPTANCE 



This International Depositary Authority accepts the microorganism identified under I above, which was received by it on 
I July 1 998 (date of the original deposit)! 



IV. RECEIPT OF REQUEST FOR CONVERSION 



The microorganism identified under I above was received by this International Depositary Authority on 

(date of the original deposit) and a request to convm the original deposit to a deposit under the Budapest Treaty was received by it on 

(date of receipt of request for conversion) 



V. INTERNATIONAL DEPOSITARY AUTHORITY 



Name: NCIMB Ltd., 



Address:23 St Machar Drive, 
Aberdeen, 
AB24 3RY, 
Scotland. 



Signature(s) of person(s) having the power to represent the 
International Depositary Authoijty or of authorised 
officiai(s):/^ ^ 





Datej^July 1998 



1 Where Rule 6/4(d) applies, such date is the date on which the status of International Depositaiy Authority was acquired. 

Form BP/4 (sole page) 



\ 
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BUDAPEST TREATY ON THE INTERNATIONAL 
RECOGNITION OF THE DEPOSIT OF MICROORGANISMS 
FOR THE PURPOSES OF PATENT PROCEDURE 



Dr. P.F. Leadlay, 
Etepartment of Biochemistry, 
University of Cambridge, 
80 Tennis Court Road, 
Cambridge. 
CB2 IGA 



NAME AND ADDRESS OF THE PARTY 
TO WHOM THE VIABILITY STATEMENT 
IS ISSUED 



INTERNATIONAL FORM 

VIABILITY STATEMENT 
issued pursuant to Rule 10^ by the 
INTERNATIONAL DEPOSITARY AUTHORITY 
identified on the foiiowing page 



1. DEPOSITOR 


U. IDENTIFICATION OF THE MICROORGANISM 


Name; 

AS ABOVE 
Address: 


Accession number given by the 
INTERNATIONAL DEPOSITARY AUTHORITY: 
NCIMB 40956 

Date of the deposit or of the transfer^: 

1 July 1998 


m. VIABILITY STATEMENT 


The viability of the microorganism identified under n above was tested on 1 July 1998 ^ On that date, the said 
microorganism was: 


S viable 
3 




LZI no longer viable 





Indicate the date of the original deposit or, where a new depositor a -transfer has been made, the most recent relevant date (date 
of the new deposit or date of the transfer). 

In the cases referred to in Rule I0.2(a)(ii) and (iii), refer to the most recent viability test. 
Mark with a cross the applicable box. 



Form BP/9 (first page) 



PCT/GBOO/02072 



IV. NDITIOMS UNDER WHICH THE VIABILITY TEST HAS BEEN PERFORMED^ 



V. INTERNATIONAL DEPOSITARY AUTHORITY 



Name: NCIMB Ltd., 

Address: 23 St Machar Drive, 
Aberdeen, 
A24 3RY, 
Scotland. 



Signature(s) of person(s) having the power 
to represent the International Depositary 
Authority or of authorised o^ial(s): 




Date^^July 1998 



Fill in if the information has been requested and if the results of the test v/ere negative. 
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BUDAPEST TREATY ON THE INTERNATIONAL 
RECOGNITION OF THE DEPOSIT OF MICROORGANISMS 
FOR THE PURPOSES OF PATENT PROCEDURE 



Dr. P.F. Leadlay, 
Department of Biochemistry, 
University of Cambridge, 
80 Tennis Court Road, 
Cambridge. 
CB2 IGA 



NAME AND ADDRESS OF THE PARTY 
TO WHOM THE VIABILITY STATEMENT 
IS ISSUED 



INTERNATIONAL FORM 

VIABILITY STATEMENT 
issued pursuant to Rule 10^ by the 
INTERNATIONAL DEPOSITARY AUTHORITY 
identified on the following page 



I. DEPOSITOR 


II. IDENTIFICATION OF THE MICROORGANISM 


Name: 

AS ABOVE 
Address: 


Accession number given by the 
INTERNATIONAL DEPOSITARY AUTHORITY: 
NCIMB 40957 

Date of the deposit or of the transfer^ : 

I July 1998 


HI. VIABILITY STATEMENT 


The viability of the microorganism identified under 11 above was tested on I July 1 998 2 On that date, the said 
microorganism was: 


S viable 
3 




IZI no longer viable 





Indicate the date of the original deposit or, where a new deposit or a transfer has been made, the most recent relevant date (date 
of the new deposit or date of the transfer). 

In the cases referred to in Rule I0.2(a)(ii) and (iii), refer to the most recent viability test. 
Mark with a cross the applicable box. 



Form BP/9 (first page) 



PCT/GBOO/02072 



IV. NDITIONS UNDER WHICH THE VIABILITY TEST HAS BEEN PERFORMED^ 



V. INTERNATIONAL DEPOSITARY AUTHORITY 



Name: NCIMBLtd., 

Address: 23 St Machar Drive, 
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L DEPOSITOR 


II. IDENTIFICATION OF THE MICROORGANISM 


Name: 

AS ABOVE 
Address: 


Accession number given by the 
INTERNATIONAL DEPOSITARY AUTHORITY: 
NCIMB 40958 

Date of the deposit or of the transfer^ : 

I July 1998 


III. VIABILITY STATEMENT 


The viability of the microorganism identified under II above was tested on 1 July 1998 2 On that date, the said 
microorganism was: 


13 viable 
3 




n no longer viable 





^ Indicate the date of the original deposit or, where a new deposit or a transfer has been made, the most recent relevant date (date 
of the new deposit or date of the transfer). 

2 In the cases referred to in Rule I0.2(a)(ii) and (iii), refer to the most recent viability test. 

^ Mark with a cross the applicable box. 
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DONOVAN M J ET AL.: "Isolation of DNA 
involved in monensin biosynthesis by 
Streptonyces ci nnamonensi s ; " 
ABSTR. ANNU. MEET. AM. SOC. MICROBIOL. 88 
MEET. . 

1988, page 261 XPO00949887 
abstract 

ARROWSMITH T J ET AL.: "Characterisation 

of act I -homologous DNA encoding polyketide 

synthase genes from the monensin producer 

Streptomyces ci nnamonens i s . " 
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which Is cited to establish the publication date of another 
citation or other special reason (as spectfted] 

X>'* document referring to an oral disclosure, use, exhibition or 
other means 

T" document published prior to the international filing date but 
later than the priority data claimed 



T" later document published after the international fHing date 
or priority date and not in conflict with the appltcatian but 
cited to understand the principle or theory underlying the 
invention 

"X" document of particular relevance; the claimed invention 
cannot be consfdered novel or cannot be considered to 
involve an inventive step when the document is taken alone 

"Y" document of particular relevance; the claimed invention 
^not be considered to Involve an inventive step when the 
document is combined writh one or more otlter such docu- 
rnents, such combination being ottvtous to a person stalled 
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Streptoniyces genes coding for synthesis of 
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NATURE, 

vol. 325, no. 6107, 

26 February 1987 (1987-02-26), pages 

818-821, XP002075972 

abstract 

page 819, left-hand column, line 16 
-right-hand column, line 1; figure 1 

ASHWORTH D M ET AL.: "Selection of a 

specifically blocked mutant of 

Streptomyces ci nnamonensi s : isolation and 
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THE JOURNAL OF ANTIBIOTICS. 

vol. 42, no. 7, July 1989 (1989-07), pages 

1088-1099, XP002149776 

cited in the application 

abstract 

page 1G88, line 10-15 
scheme 1,2 

WO 98 49315 A (KOSAN BIOSCIENCES INC ;UNIV 
LELAND STANFORD JUNIOR (US)) 
5 November 1998 (1998-11-05) 
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HOPWOOD D A: "Genetic contributions to 
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vol. 97, no. 7, November 1997 (1997-11), 
pages 2465-2497. XP0O2130647 

figures 3,13 
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page 2486, paragraph C 

WO 98 01546 A (CORTES JESUS ;LEADLAY PETER 
F (GB); STAUNTON JAMES (GB) ; BIOTICA T) 
15 January 1998 (1998-01-15) 
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Category Citation of document, with indication^where appropriate, of the relevant passages 



Relevant to claim No. 



ZERBE-BURKHARDT K ET AL.: "Cloning, 
sequencing, expression, and insertional 
inactivation of the gene for the large 
subunit of the coenzyme B12-dependent 
isobutyryl-CoA mutase from Streptomyces 
cinnamonensis." 

JOURNAL OF BIOLOGICAL CHEMISTRY, 

vol. 273, no. 11, 

13 March 1998 (1998-03-13), pages 

55G8-6517, XP002149755 

abstract 

ROWE C J ET AL: "Construction of new 
vectors for high-level expression in 
actinomycetes" 
GENE, 

vol. 216, no. 1, August 1998 (1998-08), 
pages 215-223, XP004149299 
cited in the application 
abstract 

WO 00 00500 A (LEADLAY PETER FRANCIS 
;CORTES JESUS (6B) ; STAUNTON JAMES (GB); 
BIO) 6 January 2000 (2000-01-06) 
Note: 100.0 % aa seq identity of SEQ ID 
N0:23 with SEQ ID N0:19 in 920 aa overlap, 
page 14, line 15-17 
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Box I Observations where certain claims were found unsearchable (Continuation of item 1 of first siieet) 



This IntemationaJ Search Report has not been established in respect of certain claims under Article 1 7(2)(a) for the following reasons: 
1. Q Claims Nos.: 

because they relate to subject matter not required to be searched by this Authority, namely: 



□ 



Claims Nos.; 

because they reiate to parts of the International Application that do not comply with the prescribed requirements to such 
an extent that no meaningful International Search can be carried out, specifically: 



3. I I Claims Nos.: 

because they are dependent claims and are not drafted in accordance with the second and thirti sentences of Rule 6.4{a). 

Box 11 Observations where unity of invention is lacking (Continuation of item 2 of first sheet) 

This International Searching Authority found multiple inventions in tiiis international application, as foliows: 

see additional sheet 



1. 



□ As all required additional search fees were timely paid by the applicant, this International Seart^ Report covers all 
searchable claims. 



2. I I As ail searchable claims could be searched without effort justifying an additional fee. this Authority did not invite payment 
of any additional fee. 



3. I I As only some of the required additional search fees were timely paid by the applicant, this International Search Report 
' ' covers only those claims for which fees were paid, specifically claims Nos.: 



^' UU required additional search fees were timely paid by the applicant Consequently, this International Search Report is 
restricted to the invention first mentioned in the claims; it is covered by clairris Nos,: 

see further information sheet invention 1 



Remark on Protest | [ The additional search fees were accompanied by the applicant's protest. 

[ I No protest accompanied the payment of additional search fees. 



Form PCT/ISA/210 (continuation of first sheet (1)) {July 1998) 



FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: 1,8-12,14,43,44 (all partially); 2-7,13,15-42, 
45 (all completely) 



A DMA sequence comprising the complete monensin (mon) gene 
cluster, or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with one of the peptides 
according to SEQ ID NOs 12-33 (AcpX to MonAX as set out in 
table II), provided that said polypeptide is not all or part 
of amino acid 1-920 encoded by monAI. Vectors, transformed 
cells, hybridization probes and their uses. 

Use of mon genes to control expression (monRI), to effect 
chain release (monAIX and monAX), to provide a desired 
stereochemical outcome (monBI and monBII), or to provide 
epoxidase or cyclase activity (monCI and monCII). Mon 
polypeptides having isomerase activity (MonBI and MonBI I), 
or having chain terminating activity (MonAIX or MonAX), or 
having epoxidase activity (MonCI), or having cyclase 
activity (MonCII). 

Processes for producing polyketides involving monensin 
loading or extension modules or domains. DNA sequences 
encoding hybrid polyketide synthases containing one or more 
monensin modules or domains (provided that it is not 
encoding an ery loading module, the first and second ery 
extension modules and the ery chain- terminating thioesterase 
in which the AT domain of the first ery extension module has 
been substituted by the ethyl malonyl-CoA AT from the 
monensin synthase), polyketide synthases encoded by said DNA 
sequences » and polyketide compounds produced by said 
polyketide synthases. Vectors and transformed cells. 

Methods of producing S. cinnamonensis capable of producing 
enhanced levels of monensin by overexpressing or amplifying 
the monRI gene, S. cinnamonensis strains produced thereby, 
and use of said strains in monensin production. 

Process for expressing a heterologous gene, e.g., a PKS 
gene, in S. cinnamonensis under the control of monRI. 



2, Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID N0:5 (GdhA as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 
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3. Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID NO: 6 (DapA as set out in table II). vectors, 
transformed cells, hybridization probes and their uses. 



4. Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID NO: 7 (0rf3 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



5. Claims: 1,8-12.14 (all partially) 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID NO: 8 (0rf4 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



6. Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID N0:9 (0rf5 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



7. Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 9©% identical with a peptide according 
to SEQ ID NO: 10 (Orf6 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



8. Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID NO: 11 (Orf7 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 
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9. Claims: 1»8-12,14 (all partially] 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID NO: 34 (0rf29 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



10. Claims: 1,8-12,14 (all partially) 

A DMA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID N0:35 (LipB as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



11. Claims: 1,8-12,14 (all partially) 

A DMA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID NO: 36 (Orf31 as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



12. Claims: 1,8-12,14 (all partially) 

A DMA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID M0:37 (0rf32 as set out in table II). vectors, 
transformed cells, hybridization probes and their uses. 



13. Claims: 1,8-12,14 (all partially) 

A DNA sequence or a part or a variant of it which encodes a 
polypeptide (or part of it) which is at least 80%, 
preferably at least 90% identical with a peptide according 
to SEQ ID N0:38 (AmtA as set out in table II), vectors, 
transformed cells, hybridization probes and their uses. 



14. Claims: 43,44 (both partially) 

Process for expressing a heterologous gene, e.g., a PKS 
gene, in S. cinnamonensis under the control of actII/orf4. 
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Applicant 

BIOTICA TECHNOLOGY LIMITED et al 
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This International Search Report consists of a total of 9 sheets. 

I I It is also accompanied by a copy of each prior art document cited in this report. 



1 . Basis of the report 

a. 



b. 



2. 
3. 



With regard to the language, the international search was carried out on the basis of the international application in the 
language in which it was filed, unless otherwise indicated under this item. 



□ 



the international search was earned out on the basis of a translation of the international application furnished to this 
Authority (Rule 23.1(b)). 



With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the international search 

was carried out on the basis of the sequence listing : 

[X| contained in the international application in written form. 

I I filed together with the international application in computer readable form. 

furnished subsequently to this Authority in written form. 

furnished subsequently to this Authority in computer readble form. 



□ 



the statement that the subsequently furnished written sequence listing does not go beyond the disclosure in the 
international application as filed has been furnished. 

[X] the statement that the information recorded in computer readable form is identical to the written sequence listing has been 
furnished 



I I Certain claims were found unsearchable (See Box I). 
[X) Unity of invention is lacking (see Box H). 



4. With regard to the title, 

[X[ the text is approved as submitted by the applicant. 

[ I the text has been established by this Authority to read as follows: 




5. With regard to the abstract, 

|~~[ the text is approved as submitted by the applicant. 

nn the text has been established, according to Rule 38.2(b), by this Authority as it appears in Box III. The applicant may, 
' — ' within one month from the date of mailing of this international search report, submit comments to this Authority. 

6. The figure of the drawings to be published with the abstract is Figure No. J 

[X| as suggested by the applicant. Q None of the figures. 

I I because the applicant failed to suggest a figure. 

I I because this figure better characterizes the invention. 
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see additional sheet 



1 . I I As all required additional search fees were timely paid by the applicant, this International Search Report covers all 
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of any additional fee. 
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see further information sheet invention 1 
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line 8 is modified as follows: 
removal of the word 'novel'. 
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If the applicant wishes to proceed with the international application in the national phase, he must, within 20 months 
or 30 months, or later in some Offices, perform the acts referred to therein before each designated or elected Office. 

For further important information on the time limits and acts to be performed for entering the national phase, see the 
Annex to Form PCT/IB/301 (Notification of Receipt of Record Copy) and Volume II of the PCT Applicant's Guide. 



The Int mational Bureau fWlPO 


Authorized officer 


34, chemin des Col mbettes 


J, Zahra 


1211 G neva20,Switz riand 


Facsimile No. (41-22) 740.14.35 


Telephone No. (41-22) 338.83.38 
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4278174 



PCX 

REQUEST 

The undersigned requests that the present 

international application be processed 
according to the Patent Cooperation Treaty 



For receiving Office use only 



PCT/GB 0 0 / 0 2 0 7 2 

International Application No. 



3 0 MAY 2000 ,^^^z^ 

Inteniational FUing Date 50-O5>- COOCI 


Name of rect 


Unrted Kingdom Patent Office 





Applicant's or agent's file reference 
(^desired) (12 charaaers maximum) 



IS/BP5858469 



Box No. I 



TITLE OF INVENTION POLYKETIDES AND THEIR SYNTHESIS 



Box No. n 



APPLICANT 



Name and address: (Family name followed by given name; for a legal entity, fidl official designation. 
The address must include postal code and name of country. The country of the address indicated in this Box is 
the applicant's State (that is, country) of residence if no State of residence is indicated below.) 

BIOTICA TECHNOLOGY LIMITED 

181A HUNTINGDON ROAD 

CAMBRIDGE 

CBS ODJ 

GB 



I I This person is also inventor. 



Telephone No. 



Facsimile No. 



Teleprinter No. 



State (that is. country) of nationality: GB 



State (that is, country) of residence: GB 



tiie of™s?s of ^^"''^^ n ^> designated [x] all designated States except I I the United States of I I g^e States indicted in the 
tne purposes or. i 1 ^^^^^ \ \ ^^^^ ^^^^^ America ' ' America only ' ' Supplemental Box 



Box No. m FURTHER APPLICANT(S) AND/OR (FURTHER) INVENTOR(S) 



Narne and address: (Family narne followed by given name; for a legal entity, full official designation. The 
address must include postal code and name of country. The country of me address indicated in this Box is the 
applicant s State (that is, country) of residence if no State of residence is indicated below.) 

LEADLAY PETER FRANCIS 

17 CLARENDON ROAD 

CAMBRIDGE 

CB2 2BH 

GB 



This person is: 

I I applicant only 

I X| applicant and inventor 

I inventor only (if this check-box is marked, 



State (that is, country) of nationality: GB 


State (that is, country) of residence: 


GB 


^^l^r^^^W"^'°' designated Q all desig,u«ed Smes except Ae X 
^ ^ States Umted States of America ' — 


die United States 
of America only 


1 ] the States indicated in the 
1 1 Supplemental Box 


X Further applicants and/or (further) inventors are indicated on a continuation sheet. 


Box No. IV AGENT OR COMMON REPRESENTATIVE; OR ADDRESS FOR CORRESPONDENCE 


The person identified below is hereby/has been appointed to act on behalf of the 
applicant(s) before the competent International Authorities as: 


X agent 


1 1 common representative 



Hp 



Name and address: (Family name followed by given name; for a legal entity, full official designation. 

Tne addrP^^ murt inr-luH^t n^et/^l rr^Ao nrtA vtnma j^-f ^nm.*,f^, i ° 



iress must include postal code and name of country.) 

STUART. IAN fa»^T3tfiersJ 
MEWBUftN EtLIS ^ 



YORK HOUSE 
23 KINGSWAY 
LONDON WC2B 6HP 
GB 



Telephone No. 01 17 926641 1 



Facsimile No. +44 20 7240 9339 



Teleprinter No. 



Q Mark this check-box where no agent or common representative is/has been appointed and the space above is used instead to indicate a 
special address to which correspondence should be sent. 



Form PCT/RO/101 (first sheet) (January 2000) MEWBURN ELLIS 08. 1 2.99 
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SheetNo.2 ^T/GB 00/02072 



Continuation of Box No. HI FURTHER APPLICANTS AND/OR (FURTEffiR) INVENTORS 


If none of the following sub-boxes is used, this sheet is not to be included in the request. 


Name and address: (Family name fallowed by given name; far a legal entity, fall official designation. 
The address must include posted code and name of coumry. The country of the address indicated in this Box is 
the applicant's State (thai is, coumry) of residence if no State of residence is indicated below.) 

STAUNTON JAMES 
29 PORSON ROAD 
CAMBRIDGE 
CB2 2ET 
GB 


This person is: 

1 1 applicant only 

I ^ applicant and inventor 

1 1 inventor only (if this check-box is marked, 
do not fill in below.) 


State (that is, country) of nationality: GB 


State (that is, country) of residence: GB 


S^d£S^s?s of ^^"'^^ LJ ^1 designated [1 all designated States except the f the United States Fl g^e States indicated in the 
me purposes or. i 1 ^^^^ \ 1 America of America only * ' Supplemental Box 


Name and address: (Family name fallowed by given name; for a legal entity, fall official designation. 
The address must include postal code and name of coumry. The country of the address indicated in this Box is 
the applicant's State (that is, coumry) of residence if no State of residence is indicated below.)) 

OLIYNYK MARKO 

DEPARTMENT OF BIOCHEMISTRY 

CAMBRIDGE UNIVERSITY 

TENNIS COURT ROAD 

CAMBRIDGE 

CB2 1QW 

GB 


This person is: 

1 1 applicant only 

1 X| applicant and inventor 

I 1 inventor only (if this check-box is marked, 
do not fill in below.) 


State (thai is, country) of nationality: UA 


State (that is, country) of residence: GB 


I^' f^^" ^L^PP*^*'^^ LJ designated [1 all designated States except the the United States 1 1 the States indicated in the 
the purposes of. 1—1 ^^^^^ L_I ^^^^^ America ^ of America onlv ' Supplemental Box 


Name and address : (Family name followed by given name; for a legal entity, fall official designation. 
The address must include postal code and name of country. The country of the address indicated in this Box is 
the applicant's State (that is, country) of residence if no State of residence is indicated below.)) 


This person is: 

1 1 applicant only 

1 1 applicant and inventor 

1 1 inventor only (if this check-box is marked, 
do rwt fill in below.) 


State (that is. country) of nationality: 


State (that is, country) of residence: 


Ae^oSS^ses of^^*'*^^^ 1 1 designated 1 1 all designated States except the 11 the United States 1 1 the States indicated in the 

^ ^ states United States of America ' ' of America oniv ' Supplemental Box 


Nanie and address : (Family name followed by given name; for a legal entity, fall offickddesignation. 
The address must include postal code and name of coumry. The country cfthe address indicated in this Box is 
the applicant's State (that is. country) of residence if no State of residence is indicated below.)) 


This person is: 

I 1 applicant only 

1 1 applicant and inventor 

1 1 inventor only (if this check-box is marked, 
do not fill in below.) 


State (that is. country) of nationality: j state (that is. country) of residence: 


tfie pSS^ses of ''^^'''^^ ^"^^ Lli designated 1 1 all designated States except the 1 1 the United States 1 1 the States indicated in the 

: states United States of America ' ' of America onlv ' ' Supplemental Box 


1 1 Further applicants and/or (further) inventors are indicated on another continuation sheet. 

T>r^T^/r»r\/irti / ^- _ 1 TTT TTTTT — — . 
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Sheet No. 3 



Box No. V 



DESIGNATION OF STATES 



m 



T/G6 0 0 / 0 2 0 7 2 



The following designations are hereby made under Rule 4.9(a) (mark the applicable check-boxes; at least one must be marked) 
Regional Patent ^ 



1] 



X 



X 



AP VnttnU GH Ghana, GM Gambia, KE Kenya, LS Lesotho, MW Malawi, SD Sudan, SL Sierra Leone, SZ Swaziland UG Uganda, ZW 

Zimbabwe, and any other State which is a Contracting State of the Harare Protocol and of the PCX ^b^^^<^ ^" 

EA Eurasian Patent: AM Armenia, AZ Azerbaijan BY Belarus, KG Kyrgyzstan, KZ Kazakstan, MD Republic of Moldova, RU Russian 

Federation, TJ Tajikistan, TM Turkmenistan, and any other State which is a Contracting State of the Eurasian Patent Convention and of the PCT 

EP EuropeanPatent: AT Austria^ BE Belgium CH and LI Switzerland and Liechtenstein, CY Cyprus, DE Germany, DK Etenmark, ES Spam FI 
Finland, FR France GB United Kingdom, GR Greece, IE Ireland, IT Italy, LU Luxembourg, MC Monaco, NL Netherlands PT Portuea^ SE 
Sweden, and any other State which is a Contracting State of the European Patent Convention and of the PCT ' b ^ 

9^^}y^^^^^X ^i*l!?'fJ^ ^^?iJH P^"!"' ^fr**^^ Republic, CG Congo, CI Cote d'lvoire, CM Cameroon, GA Gabon, GN Guinea, 
?A n^^^.^^^^^l ^u^h^M^^r^^^J^^ Senegal, TD Chad, TG Togo, and any other State which is a member State of OAPl 
and a Contracting State of the PCT Of other kind of protection or treatment desired, specify on dotted line) 



X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 



BG 
BR 
BY 

CA 



Nat ional Patent (if other kind of protection desired, specify on dotted line): 
AE United Arab Emirates 

AL Albania 

AM Armenia 

AT Austria, 

AU Australia- 

AZ Azerbaijan 

BA Bosnia & Herzegovina 

BB Barbados 

Bulgaria 

Brazil 

Belarus 

Canada 

CH and LI Switzerland and Liechtenstein 

CN China, 

CR Costa Rica 

CU Cuba 

CZ Czech Republic 

DE Germany 

DK Denmark 

DM Dominica 

EE Estonia 

ES Spain 

FI Finland 

GB United Kingdom. 
GD Grenada 

GE Georgia 

GH Ghana 

GM Gambia 

HR Croatia 

HU Hungary 

ID Indonesia 

Israel 

India 
Iceland 

Japan 

Kenya 

KG Kyrgyzstan 

KP Democratic People's Republic of Korea 

KR Republic of Korea 

KZ Kazakstan 



IL 
IN 
IS 
JP 
KE 



LC 
LK 
LR 
LS 



St Lucia 
Sri Lanka 
Liberia. 
Lesotho 



X 
X 
X 
X 
X 
X 
X 



LT Lithuania 
LU Luxembourg 
LV Latvia 
MA Morocco 

MD Republic of Moldova 

MG Madagascar 

MK The former Yugoslav Republic of Macedonia.. 



X 
X 
X 
X 
X 
X 

X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 



X 



MN Mongolia 

MW Malawi 

MX Mexico 

NO Norway 

NZ New Zealand 

PL Poland 

PT Portugal 

RO Romania 

RU Russian Federation 

SD Sudan 
SE Sweden 

50 Singapore 

51 Slovenia 

SK Slovakia 

SL Sierra Leone 

TJ Tajikistan 

TM Turkmenistan 

TR Turkey 

TT Trinidad and Tobago 

TZ Tanzania 

UA Ukraine 

UG Uganda 

US United States of America,. 



UZ TObekistan ... 

VN Viet Nam 

YU Yugoslavia ... 
ZA South Africa 
ZW Zimbabwe .... 



Check-boxes reserved for designating States which have become party to 
the PCT after issuance of this sheet: 



m 



DZ Algeria 

AG Antigua and Barbuda 
MZ Mozambique 



[^Pn.,....---An5^the^ state whjgjiHs party toih^^PCfJ,. 



Se ifto S^^eg2"witej?^^^ ""ienation which is no. confinned before *e expiration of 15 months from the priority 



Fonti PCT/RO/IOI (second sheet) (January 2000) 
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Sheet No. 4 



Supplemental Box 



If the Supplemental Box is not used, this sheet need not be include^n the request 



9- 

eOin th 



pi/GB 0 0 / 0 2 0 7 2 



Use this box in the following cases: 



hJ(l ««>'/?/^'Af Boxes, the space is insufficient to furnish In such case, write "Continuation of Box No. ... " (indicate the number of 

the Box) and furnish the information in the same manner as required 
according to the captions of the Box in which the space was insufficient; 



all the information. 

in particular: 

(i) if more than two persons are involved as applicants 
and/or inventors and no "continuation sheet" is 
available: 



(ii) if, in Box No. II or in any of the sub-boxes of Box No. 
Ill the indication "the States indicated in the 
Supplemental Box" is checked: 



(Hi) if in Box No. II or in any of the sub-boxes of Box No. 
Ill, the inventor or the inventor/applicant is not 
inventor for the purposes of all designated States or 
for the purposes of the United States of America: 



(iv) if in addition to the agent(s) indicated in Box No. IK 
there are further agents: 

(v) if in Box No. V, the name of any State (or OAPI) is 
accompanied by the indication "patent of addition, " or 
"certificate of addition, " or if in Box No. V, the name 
of the United States of America is accompanied by an 
indication "Continuation" or "Continuation-in-part": 

(vi) if in Box No. VI, there are more than three earlier 
applications whose priority is claimed: 



(vii) if in Box No. VI, the earlier application is an ARIPO 
application: 



If with regard to the precautionary designation 
mt contained in Box No, V, the applicant wishes to 



2. 

statement ^ 

exclude any State(s)from the scope of that statement. 



3. If the applicant claims, in respect of any designated 
Office, the benefits of provisions of the national law 
concerning non-prejudicial disclosures or exceptions to lack 
of novelty: 

Continuation of Box IV 



ARMITAGE, IAN M. 
BRASNETT. ADRIAN H. 
CALDERBANK, T. ROGER 
CARTER, STEPHEN 
COLEIRO, RAYMOND 
CRIPPS. JOANNA E 
FORD, MICHAEL F. 
HACKNEY, NIGEL J. 
HARRISON, DAVID C. 
KIDDLE. SIMON J. 
KREMER, SIMON M. 
LYONS, JUNE, M. 
NICHOLLS. KATHRYN M. 



PAGET, HUGH C.E. 
SANDERSON, MICHAEL J. 
STONER. G. PATRICK 
STUART, IAN 
.WALTON, SEAN M 
WATSON, ROBERT J. 



in such case, write "Continuation of Box III" and indicate for each 
additional person the same type of information as required in Box No. Ill 
The country of the address indicated in this box is the applicant's state (that 
is, country) of residence if no state of residence is indicated below: 

in such case, write "Continuation of Box No. II" or "Continuation of Box 
No. Ill" or "Continuation of Boxes No. II and No. Ill" (as the case may be), 
indicate the name of the applicant(s) involved and next to (each) such name, 
the State(s) (and/or, where applicable, ARIPO, Eurasian, European or 
OAPI patent) for the purposes of which the named person is applicant: 

in such case, write "Continuation of Box No. 11" or "Continuation of Box 
No. Ill" or "Continuation of Boxes No. II and No. Ill" (as the case may be), 
indicate the name of the inventor(s) and, next to (each) such name, the 
State(s) (and/or, where applicable, ARIPO, Eurasian, European or OAPI 
patent) for the purposes of which the named person is inventor: 

in such case, write "Continuation of Box No. IV" and indicate for each 
further agent the same type of information as required in Box No. IV; 

in such case, write "Continuation of Box No. V" and the name of each State 
involved (or OAPI), and after the name of each such State (or OAPI), the 
number of the parent title or parent application and the date of grant of the 
parent title or filing of the parent application; 



in such case, write "Continuation of Box No. VI" and indicate for each 
additional earlier application the same type of information as required in 
Box No. VI. 

in such case, write "Continuation of Box No. VI", specify the number of the 
item corresponding to that earlier application and indicate at least one 
country party to the Paris Convention for the Protection of Industrial 
Property for which that earlier application was filed. 

in such case, write "Designation(s) excluded from precautionary 
designation statement" and indicate the name or two-letter code of each 
state so excluded. 

in such case, write "Statement Concerning Non-Prejudicial Disclosures or 
Exceptions to Lack of Novelty" and furnish that statement below. 



Continuation of Box No. ? 
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Filing date 
of earlier application 

(day/month/year) 


Number 
of earlier application 


Where earlier application is; 


national application: 
country 


regional application:* 
regional Office 


international application: 
receiving OfTice 


item (1) 

28/05/99 
28 MAY ^ 


9912563.5 


GB 






item (2) 










item (3) 











Sheet No. 5 



Box No. VI 



PRIORITY CLAIM 



□ 



# 



■/GB 0 0 / 0 2 0 7 2 



Further priorit^iaims are indicated in the Supplemental Box 



The receiving Office is requested to prepare and transmit to the International Bureau a certified copy 
of the earlier application(s) (only if the earlier application was filed with the Office which for the 
purposes of the present international application is the receiving Office) identified above as item(s): (1 ) 



* Where the earlier application is an ARIPO application, it is mandatory to indicate in the supplemental box at least one country party to the Paris Convention 
for the Protection of industrial Property for which that earlier application was filed (Rule 4. 1 0(b) (ii)). See Supplemental Box. 



Box No. VII 



INTERNATIONAL SEARCHING AUTHORITY 



▲ 



Choice of International Searching Authority (ISA) 

(If ^vo or more International Searching Authorities 
are competent to carry out the international search, indicate the 
Authority chosen; the two-letter code may be used): 

ISA/ 



Reauest to use results of earlier search; reference to that search (if an earlier search 
has been carried out by or requested from the International Searching Authority): 



Date (day/month/year) 



Number 



Country (or regional Office) 



Box No. VIII 



CHECK LIST; LANGUAGE OF FILING 



This international application 
contains the following number 
of sheets 

request 

description (excluding 
sequence listing part) 

claims 

abstract 

drawings 

sequence listing part of 
description 



Total number of sheets 



99 
10 
1 
4 



75* 



This international application is accompanied by the item(s) marked below: 

1 . fee calculation sheet 

2. separate signed power of attorney 

3. tX I copy of general power of attorney; reference number, if any: (x 3) 

4. statement explaining lack of signature 

5. |0 I priority document(s) identified in Box No. V/ as /7em(s): 
6. 1 I translation of international application into (language): 

7. |X I separate indications concerning deposited microorganisms or other biological 

matter 

8. nucleotide and/or amino acid sequence listing in computer readable form 

9. ^ other (specify): 23/77 



Figure of the drawings which 1 
should accompany the abstract 



Language of filing of the 

international application: ENGLISH 



Box No. IX 



SIGNATURE OF APPLICANT OR AGENT 



Next to each signature indicate the name of the person signing and the capacity in which the person signs (if such capacity is not obvious from reading the request). 



STUART, IAN 
APPOINTED AGENT 



1. Date of actual receipt of the purported 3 Q MAY 2000 SOiAS.^^Mn 
international application: " %>U*u;^*tfOOO 


2. Drawings: 
["^^eceived: 

1 1 not received: 


3. Corrected date of actual receipt due to later but 
timely received papers or drawings completing 
the purported international application: 


4. Date of timely receipt of the required corrections 
under PCT Article 11(2): 


5. International Searching Authority (if two or more 
are competent): ISA/ 


6. [vj^ Transmittal of search copy delayed 
until search fee is paid 



For International Bureau use only 

21 J UNE 2000 

Form PCT/RO/101 (last sheet) (January 2000) 



Date of receipt of the record copy 
by the International Bureau: 



( 2 1. 06.00) 



MEWBURN ELLIS 08.12.99 



See Notes to the request form 



C 3 G AUG 2001 

PCT~ 



vv;po 



PATENT COOPERATION TRE;? 

PCT 

INTERNATIONAL PRELIMINARY EXAMINATION REPORT 

(PCT Article 36 and Rule 70) 



Applicant's or agent's file reference 
IS/BP5858469 


See Notification of Transmittal of international 
FOR FURTHER ACTION Preliminary Examination Report (Form PCT/IPEA/416) 


International application No. 
PCT/G BOO/02072 


International filing date (day/month/year) 
30/05/2000 


Priority date (day/month/year) 
28/05/1999 


International Patent Classification (IPC) or national classification and IPC 
C12N15/52 


Applicant 

BIOTICA TECHNOLOGY LIMITED et al. 



1 . This international preliminary examination report has been prepared by this International Preliminary Examining Authority 
and is transmitted to the applicant according to Article 36. 

2. This REPORT consists of a total of 9 sheets, including this cover sheet. 

□ This report is also accompanied by ANNEXES, i.e. sheets of the description, claims and/or drawings which have 
been amended and are the basis for this report and/or sheets containing rectifications made before this Authority 
(see Rule 70.16 and Section 607 of the Administrative Instructions under the PCT). 

These annexes consist of a total of sheets. 



3. This report contains indications relating to the following items: 



II 




III 




IV 




V 


□ 


VI 




VII 


□ 


vm 





Basis of the report 
Priority 

Non-establishment of opinion with regard to novelty, inventive step and industrial applicability 
Lack of unity of invention 

Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial app 
citations and explanations suporting such statement 

Certain documents cited 



Date of submission of the demand 
27/12/2000 



Date of completion of this report 
28.08.2001 



Name and mailing address of the international 
preliminary examining authority: 
European Patent Office 

D-80298 Munich 
Tel. +49 89 2399 - 0 Tx: 523656 epmu d 

Fax: +49 89 2399 - 4465 



Authorized officer 
Roscoe, R 

Telephone No. +49 89 2399 2554 



Form PCT/I PEA/409 (cover sheet) (January 1994) 



INTERNATIONAL PRELIMINARY 
EXAMINATION REPORT 



International application No. PCT/G BOO/02072 



I. Basis of the report 

1 . With regard to the elements of the international application (Replacement sheets which have been furnished to 
the receiving Office in response to an invitation under Article 14 are referred to in this report as "originally filed" 
and are not annexed to this report since they do not contain amendments (Rules 70. 16 and 70.17)): 
Description, pages: 

1 -1 73 as originally filed 

Claims, No.: 

1 -45 as originally filed 

Drawings, sheets: 

1/4-4/4 as originally filed 

Sequence listing part of the description, pages: 

1-80, filed with the letter of 29.09.00 

2. With regard to the language, all the elements marked above were available or furnished to this Authority in the 
language in which the international application was filed, unless otherwise indicated under this item. 

These elements were available or furnished to this Authority in the following language: , which is: 

□ the language of a translation furnished for the purposes of the international search (under Rule 23.1 (b)). 

□ the language of publication of the international application (under Rule 48.3(b)). 

□ the language of a translation furnished for the purposes of international preliminary examination (under Rule 
55.2 and/or 55.3). 

3. With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the 
International preliminary examination was carried out on the basis of the sequence listing: 

□ contained in the international application in written form. 

□ filed together with the international application in computer readable form. 
Kl furnished subsequently to this Authority in written form. 

K furnished subsequently to this Authority in computer readable form. 

□ The statement that the subsequently furnished written sequence listing does not go beyond the disclosure in 
the international application as filed has been furnished. 

I" "^"^js statement that the information recorded in computer readable form is identical to the written sequence 
..sting has been furnished. 

4. The amendments have resulted in the cancellation of: 



Form PCT/IPEA/409 (Boxes l-VIH, Sheet 1) (July 1998) 



INTERNATIONAL PRELIMINARY 
EXAMINATION REPORT 



International application No. PCT/G BOO/02072 



□ the description, 

□ the claims, 

□ the drawings, 



sheets: 



pages: 
Nos.: 



5. □ This report has been established as if (some of) the amendments had not been made, since they have been 
considered to go beyond the disclosure as filed (Rule 70.2(c)): 

(Any replacement sheet containing such amendments must be referred to under Item 1 and annexed to this 
report.) 



6. Additional observations, if necessary: 



1 . □ This report has been established as if no priority had been claimed due to the failure to furnish within the 

prescribed time limit the requested; 

□ copy of the earlier application whose priority has been claimed. 

□ translation of the earlier application whose priority has been claimed. 

2. □ This report has been established as if no priority had been claimed due to the fact that the priority claim has 

been found invalid. 

Thus for the purposes of this report, the international filing date indicated above is considered to be the relevant 



3. Additional observations, if necessary: 
see separate sheet 

111. Non-establishment of opinion with regard to novelty, inventive step and industrial applicability 

1 . The questions whether the claimed invention appears to be novel, to involve an inventive step (to be non- 
obvious), or to be industrially applicable have not been examined in respect of: 

13 the entire international application. 
□ claims Nos. . 



□ the said international application, or the said claims Nos. relate to the following subject matter which does 
not require an international preliminary examination (specify): 



H the description, claims or drawings (indicate particular elements beloW) or said claims Nos. 1 -45 are so 
unclear that no meaningful opinion could be formed (specify): 
see separate sheet 



Form PCT/IPEA/409 (Boxes l-VHI. Sheet 2) (July 1998) 



II. Priority 



date. 



because: 



INTERNATIONAL PRELIMINARY 
EXAMINATION REPORT 



International application No. PCT/G BOO/02072 



□ the claims, or said claims Nos. are so inadequately supported by the description that no meaningful opinion 
could be formed. 

K no international search report has been established for the said claims Nos. 1 , 8-12, 14, 43, 44 (all partially). 

2. A meaningful international preliminary examination cannot be carried out due to the failure of the nucleotide 
and/or amino acid sequence listing to comply with the standard provided for in Annex C of the Administrative 
Instructions: 

□ the written form has not been furnished or does not comply with the standard. 

□ the computer readable form has not been furnished or does not comply with the standard. 

IV. Lack of unity of invention 

1 . In response to the invitation to restrict or pay additional fees the applicant has: 

□ restricted the claims. 

□ paid additional fees. 

□ paid additional fees under protest. 

□ neither restricted nor paid additional fees. 

2. □ This Authority found that the requirement of unity of invention is not complied and chose, according to Rule 

68.1 , not to invite the applicant to restrict or pay additional fees. 

3. This Authority considers that the requirement of unity of invention in accordance with Rules 13.1, 13.2 and 13.3 is 

□ compiled with. 

□ not complied with for the following reasons: 

4. Consequently, the following parts of the international application were the subject of international preliminary 
examination in establishing this report: 

□ all parts. 

la the parts relating to claims Nos. 1 , 8-12, 14, 43, 44 (all partially); 2-7, 13, 15-42, 45 (all completely). 

VI. Certain documents cited 

1. Certain published documents (Rule 70.10) 

and / or 

2. Non-written disclosures (Rule 70.9) 
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# 

INTERNATIONAL PRELIMINARY 

EXAMINATION REPORT International application No. PCT/G BOO/02072 



see separate she t 



VIII. Certain observations on the international application 

The following observations on the clarity ol the claims, description, and drawings or on the question whether the 
claims are fully supported by the description, are made: 
see separate sheet 
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The documents mentioned in the present International Preliminary Examination 
Report are numbered as in the search report, i.e. D1 corresponds to the first 
document of the search report etc. 

II. Priority 

As the priority document was not available to the IPEA, this opinion / report has 
been established based upon the assumption that priority was valid. Should this 
later not turn out to be the case, then D1 0 may become relevant to the 
assessment of the present claims. 

III. No Opinion 

No opinion could be expressed for claims insofar as they relate to unsearched 
subject-matter (see section IV). Hence, no opinion has been formulated for claims 
1, 8-12, 14, 43, 44 (all partially). 

Further, the present set of claims as a whole is considered unclear, since the 
claimed subject-matter is not clearly defined. The reasons for this are set out in 
section VIII. Hence, no opinion is expressed for any of the present claims. 

IV. Lack of Unity 

The present application lacks unity and can be divided in to 14 different invention 
groups as set out in the Annex to the International Search Report. The reasoning 
for the lack of unity was set out in the invitation to pay additional fees. The 
International Preliminary Examination Authority agrees with this reasoning. Since 
applicant failed to pay additional Search Fees, only invention group I can be 
subject to Preliminary Examination. 

Preliminary statement on Novelty, Inventive Step and industrial Applicability 

For the benefit of the applicant, the authorized authority has decided to provide a 
basic indication of the novelty, inventive step and industrial applicability of 
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applicants monensin gene cluster. Individual clainns will however not be 
addressed due to the lack of clarity of the claims. 

Novelty (Art.33(2) PCT) 

None of the cited prior art documents disclose the monensin gene cluster. 
Inventive Step (Art.33(3) PCT) 

The motivation to isolate the monensin gene cluster is evidenced by prior art 
attempts using acti probes (e.g. D1 and D2). Data were however inconclusive 
although complementation data from D1 could be taken to demonstrate partial 
isolation of the cluster. Further, in view of the conflicting data, D2 (last paragraph) 
suggests that in the context of the search for monensin, one should test whether 
eryA-homologous DMA is found in S.cinnamonensis. D6 also suggests that 
probes based on ery genes should be used for the isolation of modular PKSs 
(p.2470, col.2). It is indeed the methodology suggested in D2 and D6 which 
applicant used to isolate the monensin gene cluster. Hence, applicant has merely 
put the teaching of D2 or D6 into practice. Given the monensin gene cluster, the 
uses thereof, e.g. for the construction of hybrid PKSs is routine (has been 
practiced on equivalent PKS enzymes). Hence, the isolation and uses of the 
monensin gene cluster which have been searched are not considered inventive. 

Industrial Applicability (Art.33(4) PCT) 

The present claims appear to have industrial applicability. 

VI. Certain documents 

In accordance with Rule 70.10, PCT, applicants attention is drawn to the following 
document(s): 

WO-A-00/00500 (Publication date, 06.01 .00; Priority date, 29.06.98; Filing date, 
29.06.99) 
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VIII. Certain observations 
Clarity (Art.6 PCT) 

The requirement that the claims should be clear does not only apply to individual 
claims but applies to the claims as a whole. The problems listed below 
demonstrate how extensive the lack of clarity is in the present claims. Due to this 
lack of clarity, no examination of the present set of claims can be reasonably 
carried out. 

Claim 1 - "at least part of - size ? 

Claim 2 - "monensin gene cluster" - define by technical features / "variant" - how 
much variation ? 

Claim 3 - "part of" / "allele, mutation or other variant" - define each term 

Claim 4 - "at least part" / "monBI,..." - arbitrary definitions, need to be defined by 
reference to sequences. 

Claim 6 - "as set out in the appended sequence data" - show where 

Claim 1 1 - "corresponding polypeptide" - DNA comes from defined in open-ended 
manner so need to define polypeptide more clearly. 

Claim 16 - "binds specifically" - under which conditions ? 

Claim 18 - "gene responsible for levels of activity" - definition by result to be 
achieved. Same applies to subsequent manipulation. 

Claims 21-23 - preferences should be defined in dependent clairhs 

Claim 36 - product by process definition. Not acceptable, since combination of 
known modules of PKSs could arrive at same compounds as can be produced by 
the broadly defined synthase of claim 35. Major novelty problem. 
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The above problems are found in nunnerous claims, yet only the first incidence of 
the problem has generally been referred to. Claims 1-8, 11, 16, 18-27, 30-32, 34, 
36, 39-41 and 43 all contain clarity problems of the types listed above. 
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